DLT Record Final
DLT Record Final
Description:
Image recognition is a mechanism used to identify an object within an image and to classify it in a specific category,
based on the way human people recognize objects within different sets of images.
Convolutional Neural Networks (CNNs) are a class of deep learning models designed to automatically learn and extract
hierarchical features from images. CNNs consist of layers that perform convolution, pooling, and fully connected
operations. Convolutional layers apply filters to input data, capturing local patterns and edges. Pooling layers down
sample feature maps, retaining important information while reducing computation. Fully connected layers make
decisions based on the learned features. CNNs excel in image classification, object detection, and segmentation tasks
due to their ability to capture spatial hierarchies of features.
Before we train a CNN model, let’s build a basic, Fully Connected Neural Network for the dataset. The basic steps to
build an image classification model using a neural network are:
4)Build a model architecture (Sequential) with Dense layers (Fully connected layers)
The CIFAR-10 dataset consists of 60,000 32 x 32 color images in 10 classes, with 6,000 images per class. There are
50,000 training images and 10,000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains
exactly 1000 randomly selected images from each class. The training batches contain the remaining images in random
order, but some training batches may contain more images from one class than another. Between them, the training
batches contain exactly 5000 images from each class.
The important points that distinguish this dataset from MNIST are:
50,000 training images and 10,000 testing images. The classes are completely mutually exclusive. There is no overlap
between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big
trucks. Neither includes pickup trucks.
Source code:
Import the required Libraries
This code imports essential libraries for data manipulation, visualization, and machine learning in Python, including NumPy,
Pandas, Matplotlib, Seaborn, OpenCV, and TensorFlow with Keras. It also includes modules for image recognition and
tools for creating and working with convolutional neural networks.
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
Unzip dataset
This command unzips the "cifar10.zip" file quietly in the specified Google Drive directory, commonly used to extract
datasets in image recognition tasks.
!unzip -q "/content/drive/MyDrive/deepl/cifar10.zip"
The below code Reads a CSV file containing labels for CIFAR-10 dataset images into a Pandas DataFrame, setting the first
column as the index. Prints the label of the image at index 5 in the CIFAR-10 dataset, providing information about the
content of the image. labels.shape: Returns the shape of the DataFrame, indicating the number of rows and columns,
which can give insights into the size and structure of the label data associated with the images.
# View an image
img_idx = 5
print(labels.label[img_idx])
Image.open('cifar10/'+str(img_idx)+'.png')labels.shape
automobile(50000, 1)
The code performs a stratified split of CIFAR-10 labels into training and testing sets, and then captures and stores the
corresponding indexes, ensuring consistent splits for subsequent image recognition tasks.
The below code reads and processes images from the CIFAR-10 dataset for training. It creates a NumPy array X_train
containing the float32 representations of the training images, ready for use in image recognition model training.
# Reading images for trainingtemp = []
temp.append(img)
X_train = np.stack(temp)
This code reads and converts images from the test set into a NumPy array, facilitating their use as input for image
recognition models during the testing phase. The resulting array X_test contains the pixel data of the test images.
temp.append(img)
X_test = np.stack(temp)
Displays a subset of the normalized testing image data, showing the pixel values after normalization, which is crucial for
efficient training of neural networks in image recognition tasks.
y_train.shapeX_test[6:7]
...,
...,
...,
...,
...,
...,
...,
Converts the encoded numerical labels into one-hot encoded format using Keras utility function to_categorical, creating a
binary matrix representation for each class, suitable for training a multi-class image recognition model.
encode_X_fit = encode_X.fit_transform(y_train)
y_train = keras.utils.to_categorical(encode_X_fit)
The code constructs a CNN model with two convolutional layers, each followed by batch normalization and max-pooling, a
flattening layer, and a fully-connected layer with softmax activation for classification into 10 classes. Regularization is
applied to the convolutional layers for improved generalization.
num_classes = 10
model = keras.models.Sequential([
# Adding first convolutional layer
# Normalizing the parameters from last layer to speed up the performance (optional)
keras.layers.BatchNormalization(name='BN_1'),
keras.layers.BatchNormalization (name='BN_2'),
keras.layers.Flatten(name='Flat'),
# Fully-Connected layer
])
The summary offers a comprehensive overview of the model architecture, aiding in understanding layer configurations
and parameter counts. It helps ensure the proper design and efficient training of the Convolutional Neural Network for
image recognition.
model.summary()
Model: "sequential"
=================================================================
model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(), metrics=['accuracy'])
cpfile = r'CIFAR10_checkpoint.hdf5'
Epoch 1/5
875/875 [==============================] - 66s 76ms/step - loss: 0.8606 - accuracy: 0.7212 - val_loss: 1.3002 -
val_accuracy: 0.590
<keras.src.callbacks.History at 0x7b164663da80>
Creates a Pandas DataFrame with columns 'predicted' and 'actual' to compare predicted and actual labels for the first 10
test images, aiding in model evaluation and result visualization
# << DeprecationWarning: The truth value of an empty array is ambiguous >> can arise due to a NumPy version higher
than 1.13.3.
#pred = encode_X.inverse_transform(model.predict_classes(X_test[:10]))
pred=np.argmax(model.predict(X_test[:10]),axis=-1)
pred=encode_X.inverse_transform(pred)act = y_test[:10]
cat horse
ship ship
frog airplane
frog frog
cat automobile
frog frog
ship ship
frog dog
Prints the training and testing accuracy scores, rounded to five decimal places, providing a quantitative measure of the
model's performance on both datasets.
# Printing the train and test accuracy from mlxtend.evaluate import scoring
train_acc = scoring(encode_X.inverse_transform(np.argmax(model.predict(X_train),axis=-1)),
encode_X.inverse_transform([np.argmax(x) for x in y_train]))
The code uses mlxtend to compute and plot confusion matrices for training and testing datasets, providing insights into
the model's performance by visualizing the distribution of predicted and actual class labels.
class_names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
plot_confusion_matrix(conf_mat=cm,
tick_marks = np.arange(len(class_names))
# Train Accuracy
y_predicted=encode_X.inverse_transform(np.argmax(model.predict(X_train),axis=-1)),binary=False)
y_predicted=encode_X.inverse_transform(np.argmax(model.predict(X_test),axis=-1)),binary=False)
Description:
Age Detection : Our goal here is to create a program that will predict the age group of the person using an image.
We will use the Indian Movie Face Database (IMFDB)* created by Shankar Setty et.al. as a benchmark for facial
recognition with wide variation. The database consists of thousands of images of 50+ actors taken from more than 100
videos. Since the database has been created manually by cropping the images from the video, there’s high variability in
terms of pose, expression, illumination, resolution, etc. The original database provides many attributes including:
In this scenario, we will use a cleaned and formatted data set with 26742 images split as 19906 train images and 6636
test images respectively. The target here is to use the images and predict the age of the actor/actress within the
available classes i.e. young, middle and old making it a multi-class classification problem.
we will resize all the images to 32 x 32 shape. All the images have red, blue and green color components; therefore, the
final shape becomes 32 x 32 x 3 giving us a total of 3072 nodes for the input layer.
Next, we will choose one hidden layer to start with along with 500 nodes making a total of 1536500 (3072 x 500)
connections between the input and the hidden layer. We will use the ReLU activation function in this layer.
Next, we have the output layer having only three classes and hence three nodes making a total of 1503 (500 x 3)
connections between hidden and output layer. In this layer, we will use the Softmax activation function.
Dataset:
You can download the train and test data sets. In each directory, you will find a folder consisting of images along with an
excel file which has two columns, ID and Class. The ID column consists of image names like 352.jpg and Class column
holds the respective image character’s age like Old.
The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the machine learning
model. Firstly, the training data is fed to the ML algorithms, which lets them learn how to make predictions for the given
task.
Once we train the model with the training dataset, it's time to test the model with the test dataset. This dataset
evaluates the performance of the model and ensures that the model can generalize well with the new or unseen
dataset. The test dataset is another subset of original data, which is independent of the training dataset.
Splitting the dataset into train and test sets is one of the important parts of data pre-processing, as by doing so, we can
improve the performance of our model and hence give better predictability.
Therefore, if we train and test the model with two different datasets, then it will decrease the performance of the
model. Hence it is important to split a dataset into two parts, i.e., train and test set.
In this way, we can easily evaluate the performance of our model. Such as, if it performs well with the training data, but
does not perform well with the test dataset, then it is estimated that the model may be overfitted.
The main difference between training data and testing data is that training data is the subset of original data that is used
to train the machine learning model, whereas testing data is used to check the accuracy of the model.
The training dataset is generally larger in size compared to the testing dataset.
In a dataset, a training set is implemented to build up a model, while a test (or validation) set is to validate the model
built. Data points in the training set are excluded from the test (validation) set.
Source Code:
Age Detection of Indian Actors
The line import sys is a Python statement that imports the sys module into your script or interactive session. The sys
module is part of the Python standard library and provides access tosome variables and functions that interact with the
Python interpreter.
Purpose: sys stands for "system." The module provides access to some variables used or maintained by the Python
interpreter and functions that interact with the interpreter. The codesys.modules[name].dict.clear() is a way to clear the
namespace (dictionary of names) of thecurrent Python module. Let's break down the terms used in the code:
sys: The sys module is a part of the Python standard library that provides access to some variables used or maintained by
the interpreter and functions that interact with the interpreter. Inthis case, it's used to access the modules attribute.
sys.modules: This is a dictionary that maps module names to module objects. It's a globaldictionary that keeps track of all
loaded modules in the current Python process.
[name]: name is a special variable in Python that is automatically set by the interpreter. When aPython script or module is
executed, name is set to 'main' if the script is being run as the main program. If the module is being imported, name is set
to the module's name. So, sys.modules[name] retrieves the module object for the current module.
.dict: The dict attribute of a module is a dictionary that holds the symbol table of the module. Itcontains all the names
(variables, functions, classes, etc.) defined in the module.
.clear(): This method is called on a dictionary and clears all its elements, effectively removing allnames from the module's
symbol table.
So, the overall meaning of the line of code is: "Access the global dictionary of loaded modules(sys.modules), get the
module object for the current module (sys.modules[name]), access its symbol table (dict), and clear that symbol table
(clear()), which effectively removes all names defined in the module."
import sys
import os: The os module provides a way of interacting with the operating system. It's commonlyused for tasks like file
and directory manipulation.
import numpy as np: NumPy is a library for numerical computing in Python. It provides supportfor large, multi-dimensional
arrays and matrices, along with mathematical functions to operateon these elements.
import pandas as pd: Pandas is a data manipulation library. It provides data structures likeDataFrame for efficient data
analysis and manipulation.
import matplotlib.pyplot as plt: Matplotlib is a plotting library. The pyplot module provides aconvenient interface for
creating various types of plots and charts.
%matplotlib inline: This is a Jupyter notebook magic command that ensures that Matplotlib plotsare displayed inline
within the notebook.
from tensorflow.python.keras import utils: This imports utility functions from the TensorFlow library for tasks related to
neural networks. In this case, it might be used for one-hot encoding orother preprocessing tasks.
from keras.models import Sequential: Keras is a high-level neural networks API. This import statement brings in the
Sequential class, which is used to build a linear stack of neural networklayers.
from keras.layers import Dense, Flatten, InputLayer: This imports specific layer types (Dense, Flatten, InputLayer) from
Keras. These layers are commonly used in the construction of neuralnetwork architectures.
import keras: Importing the Keras library itself. It might be redundant here if you are alreadyimporting specific
components from Keras.
import imageio: ImageIO is a library for reading and writing images in various formats.
from PIL import Image: PIL (Python Imaging Library) is a library for opening, manipulating, andsaving many different image
file formats. In this case, it's used for image resizing.
%matplotlib inline
import imageio
# To read images
It is specific to the Google Colab environment, which is a free, cloud-based platform for runningPython code, especially
popular in the context of machine learning and data science.
!: In Jupyter Notebooks or Colab, the exclamation mark ! is used to run shell commands directlyfrom the notebook cells.
-q: This flag stands for "quiet" and is used to suppress the output of the unzip command. Itmeans the command will not
print the names of the files and directories as they are being extracted.
"/content/drive/MyDrive/DLT/cifar10.zip": This is the path to the ZIP file that you want to unzip.
You're specifying the full path to a file named "cifar10.zip" in your Google Drive under the"/content/drive/MyDrive/DLT/"
directory.
When you run this command in a Colab notebook, it will unzip the contents of the "cifar10.zip" file into the current
directory. The -q flag is used to do this quietly, without printing each file as it'sbeing extracted.
!unzip -q "/content/drive/MyDrive/DLT/agedetectiontrain.zip"
!unzip -q "/content/drive/MyDrive/DLT/agedetectiontest.zip"
The provided code reads data from CSV files ('train.csv' and 'test.csv') using the pandas library inPython.
Reading Training Data: train = pd.read_csv('train.csv') This line reads the contents of the
'train.csv' file and creates a pandas DataFrame named train to store the data. A DataFrame is atwo-dimensional tabular
data structure in pandas that can hold data of different types. The
assumption here is that your training data is stored in a CSV (Comma-Separated Values) file, andpandas is used to read it
into a structured format.
Reading Test Data: test = pd.read_csv('test.csv') Similarly, this line reads the contents of the'test.csv' file and creates a
pandas DataFrame named test to store the test data.
After running these lines, you'll have two DataFrames (train and test), each containing the datafrom its respective CSV file.
You can then explore, analyze, and preprocess the data using the pandas library and proceed with your machine learning
or data analysis tasks.
This code snippet generates a random index (idx) from the training dataset, retrieves the imagefile name and
corresponding age group label from the training data, reads the image using imageio, and then displays the image along
with its associated age group using Matplotlib.
Setting Random Seed: np.random.seed(10) This line sets the random seed to ensure
reproducibility. By setting the seed, you ensure that the random number generation is the same each time you run the
code. This is useful when you want to obtain the same random results fordebugging or analysis purposes.
Choosing a Random Index: idx = np.random.choice(train.index) This line selects a random indexfrom the training dataset.
The np.random.choice function is used to randomly choose an index from the indices of the training dataset.
Getting Image Information: img_name = train.ID[idx] img = imageio.imread(os.path.join('Train', img_name)) These lines
retrieve the image file name (img_name) and read the correspondingimage using imageio. The image file is assumed to be
located in a directory named 'Train'. The os.path.join function is used to create the complete path to the image file.
plt.axis('off') plt.show() This section prints the age group associated with the randomly selectedimage (train.Class[idx])
and then uses Matplotlib to display the image. The plt.axis('off') line removes axis labels and ticks for a cleaner display.
plt.imshow(img)plt.axis('off')plt.show()
EXPERIMENT-3
AIM: Design a CNN for Image Recognition which includes hyperparameter tuning
DESCRIPTION:
A CNN can have multiple layers, each of which learns to detect the different features of an input image. A
filter or kernel is applied to each image to produce an output that gets progressively better and more detailed
after each layer. In the lower layers, the filters can start as simple features.
Convolutional Neural Networks (CNNs) are a class of deep neural networks commonly used for image
recognition and analysis, but they are also applied to various other tasks like natural language processing and
speech recognition. Hyperparameters play a crucial role in the performance of CNNs, and tuning them
effectively is essential for achieving optimal results
CNNs have achieved state-of-the-art performance on a wide range of image recognition tasks, including object
classification, object detection, and image segmentation. They are widely used in computer vision, image
processing, and other related fields, and have been applied to a wide range of applications, including self-
driving cars, medical imaging, and security systems.
Hyperparameter Tuning
The first hyperparameter to tune is the number of neurons in each hidden layer. In this case, the number of
neurons in every layer is set to be the same. It also can be made different. The number of neurons should be
adjusted to the solution complexity. The task with a more complex level to predict needs more neurons. The
number of neurons range is set to be from 10 to 100.
An activation function is a parameter in each layer. Input data are fed to the input layer, followed by hidden
layers, and the final output layer. The output layer contains the output value. The input values moving from a
layer to another layer keep changing according to the activation function. The activation function decides how
to compute the input values of a layer into output values. The output values of a layer are then passed to the
next layer as input values again. The next layer then computes the values into output values for another layer
again. There are 9 activation functions to tune in to this demonstration. Each activation function has its own
formula (and graph) to compute the input values.
The layers of a neural network are compiled and an optimizer is assigned. The optimizer is responsible to
change the learning rate and weights of neurons in the neural network to reach the minimum loss function.
Optimizer is very important to achieve the possible highest accuracy or minimum loss. There are 7 optimizers
to choose from. Each has a different concept behind it.
One of the hyperparameters in the optimizer is the learning rate. We will also tune the learning rate. Learning
rate controls the step size for a model to reach the minimum loss function. A higher learning rate makes the
model learn faster, but it may miss the minimum loss function and only reach the surrounding of it. A lower
learning rate gives a better chance to find a minimum loss function. As a tradeoff lower learning rate needs
higher epochs, or more time and memory capacity resources.
If the observation size of the training dataset is too large, it will definitely take a longer time to build the
model. To make the model learn faster, we can assign batch size so that not all of the training data are given
to the model at the same time. Batch size is the number of training data sub-samples for the input. If the
training dataset has 77,500 observations and the batch size is 1000, the model will learn 77 times with 1000
training data sub-samples and another last learning from the 500 training data sub-samples. The smaller batch
size makes the learning process faster, but the variance of the validation dataset accuracy is higher. A bigger
batch size has a slower learning process, but the validation dataset accuracy has a lower variance.
The number of times a whole dataset is passed through the neural network model is called an epoch. One
epoch means that the training dataset is passed forward and backward through the neural network once. A
too-small number of epochs results in underfitting because the neural network has not learned much enough.
The training dataset needs to pass multiple times or multiple epochs are required. On the other hand, too
many epochs will lead to overfitting where the model can predict the data very well, but cannot predict new
unseen data well enough. The number of epoch must be tuned to gain the optimal result. This demonstration
searches for a suitable number of epochs between 20 to 100.
The CIFAR-10 dataset
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There
are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch
contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining
images in random order, but some training batches may contain more images from one class than another.
Between them, the training batches contain exactly 5000 images from each class.
Here are the classes in the dataset, as well as 10 random images from each:
airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks.
"Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes
pickup trucks.vvs
Source Code:
Matplotlib: Matplotlib is a popular plotting library in Python for creating static, animated, and interactive
visualizations. It is often used for creating various types of charts and plots.
%matplotlib inline: This is a magic command in Jupyter notebooks that allows Matplotlib plots to be displayed
directly in the notebook, rather than in a separate window.
LabelEncoder: LabelEncoder is part of scikit-learn and is used for encoding categorical labels into numerical
values. This is often used in machine learning tasks where algorithms require numerical inputs.
Keras: Keras is a high-level neural networks API written in Python. It provides an easy-to-use interface for building
and training deep learning models.
Pandas: Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like
DataFrames, which are particularly useful for working with structured data.
NumPy: NumPy is a fundamental package for scientific computing with Python. It provides support for large,
multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.
PIL (Python Imaging Library): PIL is a library for opening, manipulating, and saving many different image file
formats. It has been succeeded by the Pillow library, but the import statement still uses "Image" from "PIL" for
compatibility.
os: The os module provides a way of using operating system-dependent functionality, such as reading or writing
to the file system.
Warnings: The warnings module is used to control the display of warning messages in the code.
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoderimport keras
import pandas as pdimport numpy as np
from PIL import Imageimport os
import warnings
warnings.filterwarnings('ignore')
from google.colab import drive : Imports the drive module from the google.colab package. This module
provides functions for mounting and managing Google Drive in Colab.
drive.mount('/content/drive', force_remount=True) : Mounts your Google Drive to the specified directory (
/content/drive ) in the Colab environment. The force_remount=True parameter is used to force a remount in
case the drive is already mounted.
After running this code, you will be prompted to authenticate and give Colab access to your Google Drive. Once
you've done that, your Google Drive will be accessible from within the Colab notebook, and you can navigate to
the /content/drive directory to access your files.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)Mounted at /content/drive
! : In a Jupyter notebook or Colab environment, adding an exclamation mark before a command allows you to run
shell commands directly from the notebook.
unzip : This is a command-line utility for extracting files from a ZIP archive.
-q : Stands for "quiet" mode. It suppresses the output, making the extraction process less verbose.
"/content/drive/MyDrive/DLT/cifar10.zip" : This is the path to the ZIP file you want to unzip. In this case, it's
located in your Google Drive under the path /content/drive/MyDrive/DLT/cifar10.zip .
So, the command is essentially unzipping the contents of the specified ZIP file into the current working
directory in the Colab environment.
!unzip -q "/content/drive/MyDrive/DeepvLearning for Developers/MODULE-3-CNN/cifar10.zip"
labels = pd.read_csv('/content/drive/MyDrive/DLT/cifar10Labels.csv', index_col=0) : Reads a CSV file named
'cifar10Labels.csv' from the specified path into a Pandas DataFrame. The index_col=0 argument sets the first
column of the CSV file as the index of the
DataFrame.
img_idx = 5 : Assigns the index 5 to the variable img_idx . This index is then used to retrieve information about a
specific image.
print(labels.label[img_idx]) : Prints the label of the image at the specified index ( img_idx). It assumes that the
DataFrame has a column named 'label' containing the labels for each image.
Image.open('cifar10/'+str(img_idx)+'.png') : Opens and displays the image with the filename constructed using
the index ( img_idx ). The images are assumed to be located in the 'cifar10' directory.
Please note that the path used in Image.open is relative, so it's looking for images in the 'cifar10' directory.
Ensure that the images are correctly located in that directory, and the file naming convention matches the
expected format (e.g., '5.png' for index 5).
labels = pd.read_csv('/content/drive/MyDrive/DeepvLearning for Developers/MODULE-3-
CNN/cifar10Labels.csv', index_col=0)
# View an image
img_idx = 5
print(labels.label[img_idx])
Image.open('cifar10/'+str(img_idx)+'.png')
automobile
It looks like you're splitting your data into training and testing sets using the train_test_split function from
scikit-learn. Afterward, you're
reading the images associated with the training and testing indices, converting them to NumPy arrays, and
normalizing the pixel values. Here's a breakdown of your code:
Splitting Data: train_test_split is used to split the indices of your data into training and testing sets.
The test_size=0.3 argument indicates that 30% of the data will be used for testing, and random_state=42`
ensures reproducibility.
Storing Indexes for Later Use: The indexes for training and testing sets are stored in train_idx and test_idx for
later use.
Reading and Storing Images for Training: Images associated with training indices are read, converted to NumPy
arrays, and stored in
X_train .
Reading and Storing Images for Testing: Similarly, images associated with testing indices are read, converted to
NumPy arrays, and stored in X_test .
Normalizing Image Data:The pixel values of the images are normalized to the range [0, 1] by dividing each pixel
value by 255.
# Splitting data into Train and Test data
from sklearn.model_selection import train_test_split
x_train,x_test,y_train, y_test = train_test_split(labels.index,labels.label, test_size=0.3, random_state=42)
train_idx, test_idx = y_train.index, y_test.index # Stroing indexes for later use
# Reading images for trainingtemp = []
for img_idx in y_train.index:
img_path = os.path.join('cifar10/', str(img_idx) + '.png')img =
np.array(Image.open(img_path)).astype('float32')
temp.append(img)
X_train = np.stack(temp)
# Reading images for testingtemp = []
for img_idx in y_test.index:
img_path = os.path.join('cifar10/', str(img_idx) + '.png')img =
np.array(Image.open(img_path)).astype('float32')
temp.append(img)
X_test = np.stack(temp) # Normalizing image dataX_train = X_train/255. X_test = X_test/255.
print(X_train.shape,y_train.shape) (35000, 32, 32, 3) (35000,)
LabelEncoder instantiation:Creates an instance of the LabelEncoder class. This is used to encode the
categorical labels into numerical values.
Fit and transform the training labels: Fits the encoder on the training labels ( y_train ) and transforms them into
numerical values. This step is necessary to convert categorical labels into a format suitable for machine learning
models.
One-hot encoding: Uses the to_categorical function from Keras to perform one-hot encoding on the
transformed labels. This converts the numerical labels into a binary matrix representation suitable for training
neural networks. After this process, y_train will contain the one-hot encoded representations of your original
categorical labels. Each row in y_train corresponds to a training sample, and each column represents a class,
with a value of 1 indicating the presence of that class.
# One-hot encoding 10 output classes
from keras.utils import to_categoricalencode_X = LabelEncoder()
encode_X_fit = encode_X.fit_transform(y_train)y_train = to_categorical(encode_X_fit)
# One-hot encoding 10 output classes
from keras.utils import to_categoricalencode_X = LabelEncoder()
encode_X_fit = encode_X.fit_transform(y_test)y_test = to_categorical(encode_X_fit)
First Convolutional Layer ( Conv_1 ):
32 filters, each with a 4x4 kernel. ReLU activation function.
L2 regularization with a strength of 0.001.
Input shape is (32, 32, 3), assuming RGB images.
Batch Normalization ( BN_1 ):
Normalizes the parameters from the previous layer to speed up performance (optional).
First MaxPooling Layer ( MaxPool_1 ):
Pooling layer with a 2x2 pool size.
Second Convolutional Layer ( Conv_2 ):
64 filters, each with a 4x4 kernel. ReLU activation function.
L2 regularization with a strength of 0.001.
Batch Normalization ( BN_2 ):
Normalizes the parameters from the previous layer.
Second MaxPooling Layer ( MaxPool_2 ):
Pooling layer with a 2x2 pool size.
Flatten Layer ( Flat ):bold text
Flattens the input from the previous layer to a 1D array.
Fully-Connected Layer ( pred_layer ):
Dense layer with num_classes neurons (assuming 10 classes) and softmax activation for classification.
from keras.layers import LeakyReLUnum_classes = 10
model = keras.models.Sequential([
# Adding first convolutional layer
keras.layers.Conv2D(filters=32, kernel_size=(5, 5), strides=1, padding='valid', activation='LeakyReLU',
input_shape=(32, 32, 3), name='Conv_1',batch_size=128),
# Normalizing the parameters from last layer to speed up the performance (optional)# Adding first pooling
layer
keras.layers.MaxPool2D(pool_size=(2, 2), name='MaxPool_1'),# Adding second convolutional layer
keras.layers.Dropout(0.2),
keras.layers.Conv2D(filters=64, kernel_size=(5,5), strides=1, padding='valid', activation='LeakyReLU',
name='Conv_2',batch_size=128),
# Adding second pooling layer
keras.layers.MaxPool2D(pool_size=(2, 2), name='MaxPool_2'),keras.layers.Dropout(0.2),
keras.layers.Conv2D(filters=128, kernel_size=(5,5), strides=1, padding='valid', activation='LeakyReLU',
name='Conv_3',batch_size=128),
# Adding second pooling layer
# keras.layers.MaxPool2D(pool_size=(3, 3), name='MaxPool_3'),
#keras.layers.Conv2D(filters=128, kernel_size=(5, 5), strides=1, padding='valid', activation='LeakyReLU', #
name='Conv_4',batch_size=128),
#keras.layers.MaxPool2D(pool_size=(2, 2), name='MaxPool_4'),# Flattens the input
keras.layers.Flatten(name='Flat'),
# Fully-Connected layer
keras.layers.Dense(100, activation='sigmoid', name='pred1_layer'),
keras.layers.Dense(num_classes, activation='softmax', name='pred_layer')
])
Loss Function ( loss='categorical_crossentropy' ):Categorical crossentropy is commonly used for multi-class
classification problems. It measures the difference between the predicted probabilities and the true one-hot
encoded class labels.
Optimizer ( optimizer=keras.optimizers.Adam() ):Adam is an optimization algorithm that adapts the learning rate
during training. It is well-suited for a variety of machine learning tasks, including deep learning.
Metrics ( metrics=['accuracy'] ):During training, the model will monitor the accuracy metric. This provides the
percentage of correctly classified samples out of the total.
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
cpfile = r'CIFAR10_checkpoint_filter_4.hdf5' :Specifies the file path where the model weights will be saved in
HDF5 format. Adjust the path and filename as needed.
cb_checkpoint = keras.callbacks.ModelCheckpoint(...) :
Creates a model checkpoint callback.
monitor='val_acc' : Monitors the validation accuracy during training.
verbose=1 : Prints a message when a checkpoint is saved.
save_best_only=True : Saves only the best model based on the validation accuracy.
mode='max' : The checkpoint is saved when the monitored quantity ( val_acc ) is maximized.
epochs = 5 :Specifies the number of training epochs.
model.fit(...) :
Trains the model using the training data ( X_train , y_train ).
validation_split=0.2 : Allocates 20% of the training data for validation.
callbacks=[cb_checkpoint] : Utilizes the defined checkpoint callback during training.
print(X_train.shape,y_train.shape)
(35000, 32, 32, 3) (35000, 10)
#cpfile = r'CIFAR10_checkpoint_filter_4.hdf5' # Weights to be stored in HDF5 format
#cb_checkpoint = keras.callbacks.ModelCheckpoint(cpfile, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')epochs = 30
model.fit(X_train, y_train, epochs=epochs, validation_split=0.2)
Epoch 1/30
875/875 [==============================] - 63s 71ms/step - loss: 1.6262 - accuracy: 0.4070 - val_loss:
1.3718 - val_accuracy: 0.Epoch 2/30
875/875 [==============================] - 61s 70ms/step - loss: 1.3142 - accuracy: 0.5263 - val_loss:
1.2849 - val_accuracy: 0.Epoch 3/30
875/875 [==============================] - 62s 71ms/step - loss: 1.1692 - accuracy: 0.5876 - val_loss:
1.2412 - val_accuracy: 0.
Epoch 4/30
875/875 [==============================] - 62s 71ms/step - loss: 1.0734 - accuracy: 0.6211 - val_loss:
1.1623 - val_accuracy: 0.Epoch 5/30
875/875 [==============================] - 60s 68ms/step - loss: 0.9884 - accuracy: 0.6515 - val_loss:
1.0747 - val_accuracy: 0.Epoch 6/30
875/875 [==============================] - 67s 76ms/step - loss: 0.9279 - accuracy: 0.6740 - val_loss:
1.0356 - val_accuracy: 0.Epoch 7/30
875/875 [==============================] - 64s 73ms/step - loss: 0.8788 - accuracy: 0.6925 - val_loss:
1.0096 - val_accuracy: 0.
Epoch 8/30
875/875 [==============================] - 62s 71ms/step - loss: 0.8296 - accuracy: 0.7103 - val_loss:
0.9798 - val_accuracy: 0.Epoch 9/30
875/875 [==============================] - 61s 70ms/step - loss: 0.7854 - accuracy: 0.7243 - val_loss:
0.9746 - val_accuracy: 0.Epoch 10/30
875/875 [==============================] - 62s 71ms/step - loss: 0.7562 - accuracy: 0.7363 - val_loss:
0.9749 - val_accuracy: 0.
Epoch 11/30
875/875 [==============================] - 59s 68ms/step - loss: 0.7247 - accuracy: 0.7453 - val_loss:
0.9452 - val_accuracy: 0.Epoch 12/30
875/875 [==============================] - 63s 73ms/step - loss: 0.6945 - accuracy: 0.7578 - val_loss:
0.9468 - val_accuracy: 0.Epoch 13/30
875/875 [==============================] - 64s 73ms/step - loss: 0.6698 - accuracy: 0.7649 - val_loss:
0.9679 - val_accuracy: 0.Epoch 14/30
875/875 [==============================] - 61s 70ms/step - loss: 0.6556 - accuracy: 0.7702 - val_loss:
0.9648 - val_accuracy: 0.
Epoch 15/30
875/875 [==============================] - 61s 70ms/step - loss: 0.6320 - accuracy: 0.7794 - val_loss:
0.9883 - val_accuracy: 0.Epoch 16/30
875/875 [==============================] - 61s 69ms/step - loss: 0.6158 - accuracy: 0.7818 - val_loss:
0.9931 - val_accuracy: 0.Epoch 17/30
875/875 [==============================] - 62s 71ms/step - loss: 0.5991 - accuracy: 0.7910 - val_loss:
1.0092 - val_accuracy: 0.
Epoch 18/30
875/875 [==============================] - 61s 69ms/step - loss: 0.5838 - accuracy: 0.7949 - val_loss:
1.0224 - val_accuracy: 0.Epoch 19/30
875/875 [==============================] - 63s 72ms/step - loss: 0.5780 - accuracy: 0.7963 - val_loss:
0.9906 - val_accuracy: 0.Epoch 20/30
875/875 [==============================] - 63s 72ms/step - loss: 0.5611 - accuracy: 0.8019 - val_loss:
0.9947 - val_accuracy: 0.Epoch 21/30
875/875 [==============================] - 65s 74ms/step - loss: 0.5577 - accuracy: 0.8053 - val_loss:
1.0092 - val_accuracy: 0.
Epoch 22/30
875/875 [==============================] - 61s 70ms/step - loss: 0.5434 - accuracy: 0.8095 - val_loss:
1.0150 - val_accuracy: 0.Epoch 23/30
875/875 [==============================] - 61s 70ms/step - loss: 0.5458 - accuracy: 0.8068 - val_loss:
1.0408 - val_accuracy: 0.Epoch 24/30
875/875 [==============================] - 62s 70ms/step - loss: 0.5341 - accuracy: 0.8112 - val_loss:
1.0476 - val_accuracy: 0.Epoch 25/30
875/875 [==============================] - 59s 68ms/step - loss: 0.5273 - accuracy: 0.8128 - val_loss:
1.0004 - val_accuracy: 0.Epoch 26/30
875/875 [==============================] - 62s 70ms/step - loss: 0.5238 - accuracy: 0.8144 - val_loss:
1.0359 - val_accuracy: 0.
Epoch 27/30
875/875 [==============================] - 63s 72ms/step - loss: 0.5127 - accuracy: 0.8198 - val_loss:
1.0636 - val_accuracy: 0.Epoch 28/30
875/875 [==============================] - 64s 74ms/step - loss: 0.5080 - accuracy: 0.8206 - val_loss:
1.0647 - val_accuracy: 0.Epoch 29/30
actual_train and predicted_train for Training Set:
actual_train : Extracts the actual class labels by finding the index of the maximum value in each one-hot encoded
label in y_train .
predicted_train : Uses the trained model to predict class labels for the training set using model.predict classes
.Prints Training Accuracy:
print('Train accuracy: ', scoring(actual_train, predicted_train, metric='accuracy') * 100) : Calculates and prints
the training accuracy using your custom scoring function, assuming 'accuracy' is one of the supported metrics.
actual_test and predicted_test for Test Set:
actual_test : Similar to actual_train , extracts the actual class labels for the test set.
predicted_test : Uses the trained model to predict class labels for the test set.
Prints Test Accuracy:
print('Test accuracy: ', scoring(actual_test, predicted_test, metric='accuracy') * 100) : Calculates and prints
the test accuracy using your custom scoring function.
from sklearn.metrics import accuracy_scorepredicted_train = model.predict(X_train)
prediction = np.where(predicted_train<0.5,0,1)accuracy_score(y_train, prediction)
1094/1094 [==============================] - 18s 16ms/step
0.8165142857142857
from sklearn.metrics import accuracy_scorepredicted_test= model.predict(X_test)
prediction = np.where(predicted_test<0.5,0,1)accuracy_score(y_test, prediction)
469/469 [==============================] - 9s 19ms/step
EXPERIMENT - 4
AIM: Implement a Recurrence Neural Network for Predicting Sequential Data.
DESCRIPTION:
RNN :- A recurrent neural network (RNN) is a type of artificial neural network that is used in deep learning and
in the development of models that imitate the activity of neurons in the human brain. RNNs are mainly used in
speech recognition and natural language processing (NLP).
RNN’s recognize data sequential characteristics and use patterns to predict the next likely scenario. They are
adapted to work for time series data or data that involves sequences. Ordinary feed forward neural networks
are only meant for data points that are independent of each other.
RNN’s are characterized by the direction of the flow of information between its layers. They have three gates
that determine whether or not to let new input in, delete the information because it isn't important, or let it
impact the output at the current time step.
Gated Recurrent Unit (GRU) Networks is another type of RNN that is designed to address the vanishing gradient
problem. It has two gates: the reset gate and the update gate.
LSTM
LSTM networks are the most commonly used variation of Recurrent Neural Networks (RNNs). The critical
component of the LSTM is the memory cell and the gates (including the forget gate but also the input gate),
inner contents of the memory cell are modulated by the input gates and forget gates. Assuming that both of
the segue he are closed, the contents of the memory cell will remain unmodified between one time-step and
the next gradients gating structure allows information to be retained across many time-steps, and
consequently also allows group that to flow across many time-steps. This allows the LSTM model to overcome
the vanishing gradient properly occurs with most Recurrent Neural Network models.
In this module, you will learn about how a neural network can benefit from predicting a sequence data set like
time series or sentence formation. To initiate with we use the Infosys Equities data set from January 1st, 2000
till December 31st, 2009 giving us a total of 10 years of data. The National Stock Exchange market opens only
on Weekdays excluding weekends and national holidays, therefore, you can't except data for all 365/366 days
a year.
Dataset
For our time series, we will be considering only two features, Date and Average Price.
Source Code:
The provided program uses a Recurrent Neural Network (RNN) to predict sequential data. It generates synthetic
sequences, trains the RNN model on a portion of the data, evaluates its performance on a test set, and makes
predictions on new data. The output includes training and validation loss during training and the model's loss
on the test set. Additionally, it shows the predictions made by the trained model on a new set of synthetic
sequences.
The output includes the training/validation loss during training and the test loss after evaluation. Finally, it
displays the model's predictions on new data.
import keras
This is a common step when working with Colab to access files and datasets stored in your Google Drive.
from google.colab import drive: This line imports the drive module from the google.colab package. Colab
provides this module to interact with Google Drive.
drive.mount('/content/drive', force_remount=True): This line mounts your Google Drive to the Colab
environment. It prompts you to visit a link, authorize access to your Google Drive, and enter an
authorization code. After successful authentication, your Google Drive will be mounted at the specified
path ('/content/drive'), and the content will be accessible within your Colab notebook.
The force_remount=True parameter is used to force a remount of Google Drive, even if it has been
mounted before. This can be useful if you want to ensure that the most up-to-date content from your
Google Drive is available in the Colab environment.
After running this code cell, you should see a prompt that asks you to follow a link to authorize access to
your Google Drive. Once you complete the authorization, you'll get a code to enter in the cell, and your
Google Drive will be mounted. You can then navigate to the mounted path to access your Google Drive
files.
# Loading data
data = pd.read_csv('/content/drive/MyDrive/dlt/INFY20002008.csv')data.info()
<class 'pandas.core.frame.DataFrame'>RangeIndex: 2496 entries, 0 to 2495
Data columns (total 16 columns):
MinMaxScaler(feature_range=(0, 1)): This line creates an instance of the MinMaxScaler class from scikit-learn.
The feature_range parameter is set to (0, 1), which means that the transformed data will be scaled to the
range [0, 1].
data.loc[:, 'Average Price'].values.reshape(-1, 1): This extracts the 'Average Price' column from the DataFrame
(data). The .values attribute converts it to a NumPy array, and .reshape(-1, 1) reshapes it into a single column.
This is necessary because the fit_transform method expects a 2D array or matrix as input.
scaler.fit_transform(...): This fits the scaler to the data and transforms the data simultaneously. The
fit_transform method computes the minimum and maximum values of the data and scales it to the specified
range.
The scaled values are stored in the scaled_price variable, and these scaled values can be used for training
machine learning models or other analyses where scaled features are beneficial.
# Splitting dataset in the ratio of 75:25 for training and testtrain_size = int(data.shape[0] * 0.75)
train, test = scaled_price[0:train_size, :], scaled_price[train_size:data.shape[0], :]print("Number of
entries (training set, test set): " + str((len(train), len(test))))
Number of entries (training set, test set): (1872, 624)
1. train_size = int(data.shape[0] * 0.75): This line calculates the size of the training set. It takes 75%
of the total number of entries in the dataset ( data ) and converts it to an integer using int().
2. train, test = scaled_price[0:train_size, :], scaled_price[train_size:data.shape[0], :] : This line
actually performs the split. It uses array slicing to create two sets, train and test , from the scaled_price
array. The training set includes the first train_size entries,
and the test set includes the remaining entries.
3. print("Number of entries (training set, test set): " + str((len(train), len(test)))) : This line prints
the number of entries in the training and test sets.
After running this code, you'll have two sets ( train and test ) that you can use for training and evaluating
your machine learning model. The training set will contain 75% of the data, and the test set will contain
the remaining 25%.
1. window_size = 3 : You've chosen a window_size of 3, indicating that each input sequence for the
LSTM model will consist of three time steps.
2. train_X, train_Y = create_dataset(train, window_size) : This line uses the create_dataset function to
generate training sets
( train_X and train_Y ) from the train data. train_X will contain input sequences, and train_Y will
contain corresponding output values.
3. test_X, test_Y = create_dataset(test, window_size) : Similarly, this line generates test sets ( test_X and
test_Y ) using the
create_dataset function from the test data.
4. print("Original training data shape:") : This line prints the shape of the original training data ( train_X )
before reshaping.
5. print(train_X.shape) : This line prints the shape of the original training data ( train_X ), which
represents the number of input sequences, the number of time steps, and the size of each
time step.
6. train_X = np.reshape(train_X, (train_X.shape[0], 1, train_X.shape[1])) : This line reshapes the
training input data to fit the expected input shape for an LSTM model. The new shape is
(number of input sequences, 1, number of time steps) .
7. test_X = np.reshape(test_X, (test_X.shape[0], 1, test_X.shape[1])) : Similarly, this line reshapes
the test input data to match the format required by the LSTM model.
8. print("New training data shape:") : This line prints the shape of the reshaped training data ( train_X
). This reshaping is necessary because LSTM models in Keras expect input data in the shape (number
of samples, number of time steps,number of features) . In your case, each input sequence is treated as
a single sample with three time steps and one feature.
The LSTM architecture here consists of:
Epoch 1/3
1868/1868 [==============================] - 4s 2ms/step - loss: 0.0055
Epoch 2/3
1868/1868 [==============================] - 3s 2ms/step - loss: 3.9482e-04
Epoch 3/3
1868/1868 [==============================] - 3s 1ms/step - loss: 3.7661e-04
<keras.src.callbacks.History at 0x7b1dbf77dd20>
1. model = Sequential() : This line initializes a sequential model. A sequential model is appropriate for a
plain stack of layers where each layer has exactly one input tensor and one output tensor.
2. model.add(LSTM(4, input_shape=(1, window_size))) : This line adds an LSTM layer to the model.
The layer has 4 units (LSTM cells), and the input_shape parameter is set to (1, window_size) ,
indicating that the input sequences have one time step and window_size features.
3. model.add(Dense(1)) : This line adds a dense (fully connected) layer with 1 unit to the model. This
layer is responsible for producing the output of the model.
4. model.compile(loss="mean_squared_error", optimizer="adam") : This line compiles the model.
The chosen loss function is mean squared error ( "mean_squared_error" ), and the optimizer is
Adam ( "adam" ). The loss function is a measure of how well the model is performing, and the
optimizer is responsible for updating the model's weights during training to minimize the loss.
5. model.fit(train_X, train_Y, epochs=3, batch_size=1) : This line trains the model using the
training data ( train_X as input and train_Y as target). The training is performed for 3 epochs
with a batch size of 1. An epoch is one complete pass through the entire training dataset.
After running this code, your LSTM model will be trained on the specified data. You can then use the
trained model for making predictions on new data or evaluating its performance on the test set.
def predict_and_score(model, X, Y):
# Make predictions on the original scale of the data.
pred = scaler.inverse_transform(model.predict(X))
# Prepare Y data to also be on the original scale for interpretability.
orig_data = scaler.inverse_transform([Y])
# Calculate RMSE.
score = np.sqrt(mean_squared_error(orig_data[0], pred[:, 0]))return(score, pred)
rmse_train, train_predict = predict_and_score(model, train_X, train_Y)rmse_test, test_predict =
predict_and_score(model, test_X, test_Y)
print("Training data score: %.2f RMSE" % rmse_train)print("Test data score: %.2f RMSE" % rmse_test)
print("Training data score: %.2f RMSE" % rmse_train) : This line prints the RMSE for the training set.
print("Test data score: %.2f RMSE" % rmse_test) : This line prints the RMSE for the test set.
These RMSE scores can be used to evaluate the performance of your LSTM model on both the training and
test datasets. Lower RMSE values indicate better model performance.
# Start with training predictions.
train_predict_plot =
np.empty_like(scaled_price)
train_predict_plot[:, :] = np.nan
train_predict_plot[window_size:len(train_predict) + window_size, :] = train_predict
This plot allows you to visually compare the true values with the predicted values for both the training and
test sets.
EXPERIMENT-5
Description:
Noise: Humans are prone to making mistakes when collecting data, and data collection instruments may be
unreliable, resulting in dataset errors. The errors are referred to as noise. Data noise in deep learning can cause
problems since the algorithm interprets the noise as a pattern and can start generalizing from it.
Noise Removal: In the proposed algorithm, the training process consists of three successive steps. In the first
step, a classifier is trained to classify the noisy and clean images. In the second step, a denoiser network aims
to remove the noise in the image features that are extracted by the trained classifier. Finally, a decoder is utilized
to map back the denoised images features into images pixels.
Denoising: Denoising is an advanced technique used to decrease grainy spots and discoloration in images while
minimizing the loss of quality.
Image Denoising is a computer vision task that involves removing noise from an image.
Image denoising plays an important role in a wide range of applications such as image restoration, visual
tracking, image registration, image segmentation, and image classification, where obtaining the original image
content is crucial for strong performance.
The median filter is excellent for denoising an image in the case of salt-and-pepper noise because it does not
blur the image, as a mean filter would do. Despite its name, the median filter is not a filter because it does not
respect the linearity property. Therefore it cannot be written as a convolution.
Dataset:
For Image Denoising we use the CIFAR-10 datasets.
➢ The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly
used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for
machine learning research.
➢ The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images
dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 super
classes. There are 600 images per class.
➢ Image Classification is a method to classify the images into their respective category classes. CIFAR-10
Dataset as it suggests has 10 different categories of images in it. There is a total of 60000 images of 10 different
classes naming Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.
➢ CIFAR-10 is a dataset containing 60,000 images of 10 classes and is considered a typical dataset for computer
vision problems. In this assignment, I'll try out a few neural network configurations and hyperparameters and
compare the results.
➢ The Range of cifar-10 is 0 to 255.
➢ The error rate of a human on CIFAR-10 is estimated to be around 6%, which means that a model achieving
above 94% accuracy will be regarded as a super-human performance. According to paperswithcode.com, the
best model can reach 99% accuracy on CIFAR-10.
➢ For the CIFAR-10 results the best test accuracy corresponds to batch sizes m=4 and m=8, although quite good
results are maintained out to m=128. ➢ CIFAR-10 is an established computer-vision dataset used for object
recognition. It is a subset of the 80 million tiny images dataset and consists of 60,000 32x32 color images
containing one of 10 object classes, with 6000 images per class. It was collected by Alex Krizhevsky, Vinod Nair,
and Geoffrey Hinton.
➢ CIFAR-10 – An image classification dataset consisting of ten classes of sixty thousand images. There are five
training batches and one test batch in the dataset and there are 10000 images in each batch. The size is 170
MB.
➢ CINIC-10 is a dataset for image classification. It has a total of 270,000 images, 4.5 times that of CIFAR-10. It
is constructed from two different sources: ImageNet and CIFAR-10. Specifically, it was compiled as a bridge
between CIFAR-10 and ImageNet.
➢ The Convolutional Neural Network has 4 convolution layers and pooling layers with 2 fully connected layers
in CIFAR-10.
➢ CIFAR-10 is a well-understood dataset and widely used for benchmarking computer vision algorithms in the
field of machine learning. The problem is “solved.” It is relatively straightforward to achieve 80% classification
accuracy.
Source Code:
The import statement import pandas in Python brings the entire pandas library into your script or Jupyter
Notebook. However, it's a common convention to use the alias pd to refer to pandas. This makes it more
concise and is widely adopted in the data science community.
Importing NumPy: The import numpy part brings the entire NumPy library into your script or Jupyter
Notebook. NumPy is a powerful library for numerical operations and array manipulations in Python.
Assigning an Alias: The as np part gives NumPy a shorter alias, in this case, np. This is a widely adopted
convention and makes it more convenient to refer to NumPy functions and classes throughout your code.
When you use the statement import matplotlib.pyplot as plt in Python, you are importing the pyplot
module from the Matplotlib library and assigning it the alias plt. This is a common convention in the data
visualization community, making it more convenient to refer to Matplotlib functions and classes
throughout your code.
When you use the statement from PIL import Image in Python, you are importing the Image module from
the Python Imaging Library (PIL) or, more commonly, from its fork called "Pillow." Pillow is a powerful
library for working with images in various formats.
When you use the statement import os in Python, you are importing the os module, which provides a way to
interact with the operating system. The os module allows you to perform various operating system-related tasks,
such as working with file systems, directories, and environment variables.
It is specific to the Google Colab environment, which is a free, cloud-based platform for running Python
code, especially popular in the context of machine learning and data science.
from google.colab import drive
drive.mount('/content/drive',force_remount=True) Mounted at /content/drive
!: In Jupyter Notebooks or Colab, the exclamation mark ! is used to run shell commands
directly from the notebook cells. unzip: This is the command-line utility for unzipping files.
-q: This flag stands for "quiet" and is used to suppress the output of the unzip command. It means the
command will not print the names of the files and directories as they are being extracted.
"/content/drive/MyDrive/DLT/cifar10.zip": This is the path to the ZIP file that you want to unzip. You're
specifying the full path to a file named "cifar10.zip" in your Google Drive under the
"/content/drive/MyDrive/DLT/" directory.
When you run this command in a Colab notebook, it will unzip the contents of the "cifar10.zip" file into the
current directory. The -q flag is used to do this quietly, without printing each file as it's being extracted.
!unzip -q "/content/drive/MyDrive/DLT/cifar10.zip"
The np.array function is used to create a NumPy array, which is a multi-dimensional, homogeneous, and
flexible array object. NumPy arrays are more efficient than Python lists for numerical operations and are a
cornerstone for numerical computing tasks, including data analysis,
machine learning, and scientific research.
The plt.imshow() function is used to display an image or 2D array as a plot. It is particularly useful for
visualizing images in the context of data analysis, computer vision, and image processing.
The plt.show() function is a part of the Matplotlib library in Python and is used to display the current figure
that has been created using Matplotlib functions. It is commonly used in scripts or Jupyter Notebooks to
render and show the Matplotlib plots.
img = np.array(Image.open('cifar10/5.png')) plt.imshow(img)
plt.show()
img_arr = []
The line img_arr = np.array(img_arr) is converting the Python list img_arr, which contains NumPy arrays
representing images, into a single NumPy array. This is often done to create a structured and efficient
representation of a dataset for further processing or analysis.
If img_arr is a NumPy array resulting from the code img_arr = np.array(img_arr), you can check its shape using the
.shape attribute. The shape of the NumPy array provides information about the number of dimensions and the
size of each dimension.
The term "noise factor" typically refers to a parameter or variable used to introduce random variations or disturbances
into a system, process, or data. It is commonly employed in various fields such as signal processing, communication
systems, simulations, and machine learning.
you're using the noise_factor to add random noise to an image represented by the img_arr variable. The code is applying a
Gaussian (normal) distribution of random numbers to each pixel in the image.
you're using Matplotlib to visualize one of the noisy images. The plt.imshow() function is commonly used to display
images in Python.
we have included the plt.show() line, which is essential for displaying the image. When working with Matplotlib in
interactive environments or scripts, this function is used to show the plot you've created with plt.imshow().
plt.imshow((noisy_imgs[4]*255).astype(np.uint8)) plt.show()
It looks like you're importing layers from the Keras library, which is commonly used for building neural network models in
Python. The layers you've imported are typical components of a Convolutional Autoencoder, a type of neural network
architecture used for tasks like image
denoising and dimensionality reduction. Input: Input is used to create an input tensor. It defines the shape of the input
data that will be fed into the model.
Conv2D: Conv2D is a 2D convolutional layer. It performs convolutional operations on 2D input data, which is often used
for image processing tasks. This layer is responsible for learning spatial hierarchies of features.
MaxPooling2D: MaxPooling2D is a downsampling layer that performs max pooling operation on the spatial dimensions of
the input. It helps reduce the spatial dimensions of the representation and retains the most important information.
UpSampling2D: UpSampling2D is an upsampling layer that increases the spatial dimensions of the input. It is often used in
combination with convolutional layers to learn to reconstruct the input data.
The Model class from Keras is used to instantiate a model in Keras, which can be a complete neural network or a
submodel (e.g., an encoder or decoder in an autoencoder). This class allows you to define the input and output of the
model, essentially specifying the architecture.
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D from keras.models import Model
def auto_encoderis used to define a function called auto_encoder that likely aims to implement an autoencoder. An
autoencoder is a type of neural network architecture used for unsupervised learning, particularly for dimensionality
reduction and feature learning.
f_size defined a variable f_size with a value of 3, representing the filter size. In the context of neural networks, particularly
convolutional neural networks (CNNs), the filter size refers to the dimensions of the convolutional kernel.
p_size defined a variable p_size with a value of 1, representing the pool size. In the context of neural networks, specifically
convolutional neural networks (CNNs), the pool size refers to the dimensions of the pooling window used in max
pooling or average pooling layers.
conv_1 = Conv2D(32, (f_size, f_size), activation='relu', padding='same')(img)defining a convolutional layer in Keras with 32
filters, a specified filter size (f_size), ReLU activation, and 'same' padding. This is a common configuration for a
convolutional layer in a neural network.
pool_1 = MaxPooling2D(pool_size=(p_size, p_size))(conv_1) defines that we are adding a MaxPooling2D layer to your
neural network. This layer is commonly used to downsample the spatial dimensions of the input by taking the maximum
value from a pool of values.
conv_2 = Conv2D(64, (f_size, f_size), activation='relu', padding='same')(pool_1) defines that we are adding another
convolutional layer (conv_2) to your neural network. This is a common practice in deep learning models, where each
convolutional layer is designed to learn increasingly, complex features from the input data.
pool_2 = MaxPooling2D(pool_size=(p_size, p_size))(conv_2) says that we are adding another MaxPooling2D layer
(pool_2) to your neural network. This is a common practice, especially in convolutional neural networks, where pooling
layers are used to downsample the spatial dimensions of the input.
conv_3 = Conv2D(128, (f_size, f_size), activation='relu', padding='same')(pool_2) with 128 filters and activation is relu
function and same padding.
conv_4 = Conv2D(128, (f_size, f_size), activation='relu', padding='same')(conv_3) with 128 filters and activation is relu
functiom and with same padding.
up_1 = UpSampling2D((p_size, p_size))(conv_4) defines the adding of an UpSampling2D layer (up_1) to your neural
network. The UpSampling2D layer is used to increase the spatial dimensions of the input, effectively "upsampling" the
feature maps.
conv_5 = Conv2D(64, (f_size, f_size), activation='relu', padding='same')(up_1) with 64 filters and relu activation function
and with same padding.
up_2 = UpSampling2D((p_size, p_size))(conv_5) deffines the adding an another UpSampling2D layer (up_2) to your neural
network. Similar to the previous explanation, the UpSampling2D layer is used to increase the spatial dimensions of the
input.
decoded = Conv2D(3, (f_size, f_size), activation='sigmoid', padding='same')(up_2) descibes Conv2D Layer (decoded):
Type: Convolutional layer. Number of Filters: 3 - Assuming you're working with RGB images, this is typical, as RGB images
have three channels (Red, Green, Blue). Filter (Kernel) Size: (f_size, f_size) - The size of the convolutional kernel,
determining the spatial extent of weights learned by the layer. Activation Function: Sigmoid - Often used for the last layer
of an autoencoder to squash the pixel values between 0 and 1, suitable for image data. Padding: 'same' - Pads the input
so that the output has the same spatial dimensions as the input. Input to decoded (up_2):
up_2 is the assumed output from the previous UpSampling2D layer. Output of decoded:
The output of decoded will have three channels (assuming RGB images) and potentially higher spatial resolution
compared to the input to up_2.
# Decoder module
Input Layer (img): Type: Input layer. Shape: (32, 32, 3) - This specifies the shape of the input data. In this case, it's a 3D
tensor with a shape of (height, width, channels). 32 is the height of the input image. 32 is the width of the input image.
3 is the number of channels, typically
model = Model(img, auto_encoder(img)) describes that a Keras model using the Model class, where the input is img
(presumably an image input tensor) and the output is the result of the auto_encoder function applied to img.
The choice of the mean squared error (MSE) as the loss function suggests that your model is likely designed for a
regression task. In the context of autoencoders, MSE is often used when the goal is to minimize the difference between
the input and the reconstructed output. Optimizer ('adam'):
The Adam optimizer is specified for training your model. Adam is a popular optimization algorithm that adapts the
learning rates for each parameter during training, making it well-suited for a variety of tasks. Compile Method:
The compile method configures the model for training by specifying the loss function and optimizer. After compiling, the
model is ready to be trained using the fit method.
model.summary() describes that To obtain a summary of your compiled Keras model, you can use the summary method.
This will provide a detailed overview of your model's architecture, including the number of parameters in each layer.
img = Input(shape=(32, 32, 3))
=================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
=================================================================
The input data for training is the noisy images. noisy_imgs[:120] implies that the first 120 samples of your noisy images
dataset are used for training. Target Data (img_arr[:120]):
The target data is the clean images corresponding to the noisy input images. img_arr[:120] implies that the first 120
samples of your clean images dataset are used as the target for training. Number of Epochs (epochs=10):
Training will be performed for 10 epochs. An epoch is one complete pass through the entire training dataset. Validation
Split (validation_split=0.2):
20% of the training data (noisy_imgs[:120] and img_arr[:120]) will be used as a validation set. The model's performance
on this set will be monitored during training. Batch Size (batch_size=1):
The training is performed with a batch size of 1. This means that the model's parameters will be updated after processing
each individual sample.
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
<keras.src.callbacks.History at 0x7e947b5add80>
pred = model.predict(img_arr) describes that this line of code assumes that model is already trained and
has learned a representation of the data. The predictions (pred) will be compared with the original clean
images to evaluate how well your model has learned to denoise the input.
pred = model.predict(img_arr)
plt.figure(figsize=(10, 5))
It seems like you are creating a new figure for a plot using matplotlib. The plt.figure function is commonly used to create a
new figure for plotting. The figsize parameter is used to specify the width and height of the figure in inches.
This line creates a subplot in a grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 0) specifies the
position of the subplot within the grid. In this case, you're creating a subplot in the first column of the first row.
This line sets the title of the subplot to 'Original image' with a fontsize of 'large'. ax1.imshow(img_arr[4])
This line displays the image stored in img_arr[4] within the first subplot (ax1). plt.subplot2grid((1, 3), (0, 1)):
This line creates a subplot in the same grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 1) specifies
the position of the subplot within the grid. In this case, you're creating a subplot in the second column of the first row.
This line sets the title of the subplot to 'Noisy image' with a fontsize of 'large'.
ax2.imshow((noisy_imgs[4]*255).astype('uint8')):
This line displays the noisy image stored in noisy_imgs[4] after scaling it by 255 and converting the data type to 'uint8'. The
multiplication by 255 and type conversion is often done to scale the pixel values back to the standard 8-bit range.
This line creates a subplot in the same grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 2) specifies
the position of the subplot within the grid. In this case, you're creating a subplot in the third column of the first row.
This line sets the title of the subplot to 'Reconstructed image' with a fontsize of 'large'. ax3.imshow(pred[4]):
plt.show() is used to display the entire plot with all three subplots. This function is typically used to display the plots you
have created using matplotlib.
This line displays the reconstructed image stored in pred[4]. The pred variable is assumed to contain the model's
predictions.
plt.figure(figsize=(10, 5))
plt.show()
plt.figure(figsize=(10, 5))
It seems like you are creating a new figure with a specific size using plt.figure(figsize=(10, 5)). This line sets the width and
height of the figure to be 10 inches by 5 inches, respectively.
This line creates a subplot in a grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 0) specifies the
position of the subplot within the grid. In this case, you're creating a subplot in the first column of the first row.
This line sets the title of the subplot to 'Original image' with a fontsize of 'large'. ax1.imshow(img_arr[131]):
This line displays the image stored in img_arr[131] within the first subplot (ax1). plt.subplot2grid((1, 3), (0, 1)):
This line creates a subplot in the same grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 1) specifies
the position of the subplot within the grid. In this case, you're creating a subplot in the second column of the first row.
This line sets the title of the subplot to 'Noisy image' with a fontsize of 'large'. ax2.imshow(noisy_imgs[131]):
This line displays the noisy image stored in noisy_imgs[131] within the second subplot (ax2). plt.subplot2grid((1, 3), (0,
2)):
This line creates a subplot in the same grid with 1 row and 3 columns. (1, 3) specifies the grid shape, and (0, 2) specifies
the position of the subplot within the grid. In this case, you're creating a subplot in the third column of the first row.
ax2.set_title('Noisy image', fontsize='large'):
This line sets the title of the subplot to 'Noisy image' with a fontsize of 'large'.
ax2.imshow(noisy_imgs[131]):
This line displays the noisy image stored in noisy_imgs[131] within the second subplot (ax2).
plt.subplot2grid((1, 3), (0, 2)):
This line creates a subplot in the same grid with 1 row and 3 columns. (1, 3) specifies the grid shape,
and (0, 2) specifies the position of the subplot within the grid. In this case, you're creating a subplot
in the third column of the first row.
This line sets the title of the subplot to 'Reconstructed image' with a fontsize of 'large'.
ax3.imshow(pred[131]):
This line displays the reconstructed image stored in pred[131] within the third subplot (ax3).
The plt.show() command will display the entire figure with all three subplots. This will include the
original image, the corresponding noisy image, and the reconstructed image for the data at index
131.
plt.figure(figsize=(10, 5))
plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1]
for floats or [0..255] for integer
EXPERIMENT - 6
DESCRIPTION:
Object Detection:
Object detection is the task of detecting instances of objects of a specific class within an image or video. Basically,
it locates the existence of objects in an image using a bounding box and assigns the types or classes of the objects
found. For instance, it takes an image as input and generates one or more bounding boxes, each with the class label
attached. These algorithms are powerful enough to handle multi-class classification and localization and objects with
multiple occurrences.
• Image classification
• Object localization
Image classification algorithms predict the type or class of an object in an image among a predefined set of classes
that the algorithm was trained for. Usually, input is an image with a single object, such as a cat. Output is a class or
label representing a particular object, often with a probability of that prediction.
Object localization algorithms locate the presence of an object in the image and represent its location with a
bounding box. They take an image with one or more objects as input and output the location of one or more
bounding boxes using their position, height, and width.
Generally, object detection methods can be classified as either neural network-based or non-neural approaches.
Also, some of them are rule-based, where the rule is predefined to match specific objects. Non-neural approaches
require defining features using some feature engineering techniques and then using a method such as a support
vector machine (SVM) to do the classification.
DATASET:
We used COCO dataset for object detection
• Common Objects in Context (COCO) is one such example of a benchmarking dataset, used widely throughout the
computer vision research community. It even has applications for general practitioners in the field, too. The
Microsoft Common Objects in Context (COCO) dataset is the gold standard benchmark for evaluating state of the art
of computer vision models.
• COCO contains over 330,000 images, of which more than 200,000 are labelled, across dozens of categories of
objects. COCO is a collaborative project maintained by computer vision professionals from numerous prestigious
institutions, including Google, Caltech, and Georgia Tech.
• The COCO dataset is designed to represent a vast array of things that we regularly encounter in everyday life, from
vehicles like bikes to animals like dogs to people.The COCO dataset contains images from over 80 "object" and 91
generic "stuff" categories, which means the dataset can be used for benchmarking general-purpose models more
effectively than small-scale datasets. •In addition, coco dataset contains:
1. 121,408 images
2. 883,331 object annotations
• The COCO dataset can be used for multiple computer vision tasks. COCO is commonly used for object detection,
semantic segmentation, and keypoint detection.
• Objects are annotated with a bounding box and class label. This annotation can be used to identify what is in an
image. In the example below, giraffes and cows are identified in a photo of the outdoors.
• The dataset has two main parts: the images and their annotations:
1. The annotations are provided in JSON format, with each file corresponding to a single image.
2. The images are organized into a hierarchy of directories, with the top-level directory containing subdirectories for
the train,test and validation sets.
1. Object detection with bounding box coordinates and full segmentation masks for 80 different objects
2. Stuff image segmentation with pixel maps displaying 91 amorphous background areas
3. Panoptic segmentation identifies items in images based on 80 "things" and 91 "stuff" categories
• The COCO dataset can be used to train object detection models. The dataset provides bounding box coordinates
for 80 different types of objects, which can be used to train models to detect bounding boxes and classify objects in
the images.
• In object detection, the bounding boxes are always rectangular. As a result, if the object contains the curvature
part, it does not help determine its shape. In order to find precisely the shape of the object, we should use some of
the image segmentation techniques.
Source Code:
The Darknet repository is an open-source deep learning framework written in C and CUDA, which supports various
neural network architectures, including YOLO. It is developed by AlexeyAB, and it provides a complete
implementation of the YOLO algorithm. The Darknet framework is well- known for its performance in object
detection tasks and is frequently used by researchers and developers working with YOLO. The code
provided suggests the first step to start working with the YOLO algorithm using the Darknet framework is to clone the
Darknet repository from GitHub. Cloning the repository means downloading a copy of the repository to your local
machine for further development, experimentation, or usage.
remote: Total 15833 (delta 0), reused 0 (delta 0), pack-reused 15833 Receiving objects: 100% (15833/15833), 14.39
MiB | 18.62 MiB/s, done. Resolving deltas: 100% (10666/10666), done.
This code is meant to modify the Makefile of the Darknet repository to enable GPU support and OpenCV (computer
vision library) integration. These modifications are crucial for accelerating the training and inference of YOLO models
and for working with images and videos. Here's an explanation of each line in the code:
%cd darknet: This is a Jupyter Notebook magic command that changes the current working directory to the
"darknet" directory. It assumes that the "darknet" directory is present in the current location, and it's typically where
you would find the Darknet source code.
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile: This line uses the sed command to perform an in-place replacement in
the Makefile of Darknet. It searches for the line that specifies OPENCV=0 and replaces it with OPENCV=1 , enabling
OpenCV support.
!sed -i 's/GPU=0/GPU=1/' Makefile: Similarly, this line replaces GPU=0 with GPU=1 in the Makefile, enabling GPU
support.
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile: This line replaces CUDNN=0 with CUDNN=1 in the Makefile. Enabling
CUDNN (CuDNN) is essential for using NVIDIA's optimized deep learning libraries for improved performance.
%cd darknet
/content/darknet
The below code is meant to verify the CUDA (Compute Unified Device Architecture) version installed on a virtual
machine. CUDA is a parallel
computing platform and API developed by NVIDIA, and it is commonly used for GPU-accelerated deep learning and
other scientific computing tasks.
!/usr/local/cuda/bin/nvcc --version: This is a shell command that uses the NVIDIA CUDA Compiler (nvcc) to check the
version of
CUDA installed on the system. It will display the version information in the output. Make sure that the path
/usr/local/cuda/bin/nvcc is correct for your virtual machine. In some cases, you might need to use a different path to
the nvcc executable if CUDA is installed in a
different location. When you run this code, it will show you information about the installed CUDA version, which is
important for compatibility with GPU-accelerated libraries and frameworks like TensorFlow, PyTorch, and Darknet.
!/usr/local/cuda/bin/nvcc --version
This code is a command to build the Darknet framework. Building Darknet will compile the source code, resulting in an
executable file that you can use to run or train object detectors using YOLO models. Here's what the code does:
1. !make: This is a shell command that invokes the make utility. The make utility is commonly used for building
software projects by reading a Makefile (a script that defines how to compile and link the code) and executing
the compilation and
linking commands specified within it. When you run !make, it will execute the compilation process specified in
Darknet's Makefile. This process typically involves compiling the source code, linking libraries, and generating
the executable file that allows you to work with YOLO for object detection.
# make darknet (builds the darknet to make executable file to run or train object detectors)
!make
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv4 2> /dev/null ||
pkg-config --c
The below code is a command to download a pre-trained YOLOv4 model weights file from a specific GitHub release.
Here's what this code does: !wget
https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights`: This command
uses the wget utility to download a file from a given URL. In this case, it's downloading the YOLOv4 model weights file
from the specified GitHub release. The URL points to a specific release version where the YOLOv4 model weights are
hosted. These weights can be used for inference or fine-tuning in your YOLOv4-based object detection tasks. After
running this command, the yolov4.weights file will be downloaded to your current working
directory, and you can use it with the Darknet framework or other YOLO-based implementations for various object
detection tasks. Make sure you have the necessary permissions and disk space to download the file.
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
--2023-12-20 11:45:38--
https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights Resolving
github.com (github.com)... 192.30.255.112
This code defines a set of useful functions for working with images and files in a Google Colab environment. Here's an
explanation of each of these functions:
imShow(path): This function is used to display an image stored at the specified path. It relies on the OpenCV library for
reading and
processing images and Matplotlib for displaying them. The image is read from the given path, resized to a larger size,
and then displayed in the Colab notebook using Matplotlib. This is a handy function for visualizing images.
upload() : This function allows you to upload files to a Google Colab notebook. It uses the files.upload()
function from the google.colab library to prompt the user to upload files. Once the user uploads a file, it is saved to
the current working directory of the Colab notebook.
download(path): This function is used to download a file from a Google Colab notebook. You specify the path
to the file you want to download, and it uses the files.download() function to trigger the download of that file to
your local machine.
import cv2
%matplotlib inline
image = cv2.imread(path)
fig = plt.gcf()
f.write(data)
The above code is a command to run the Darknet detection on a test image using the YOLOv4 model and the COCO
dataset configuration. Here's an explanation of the command:
!./darknet: This is the path to the Darknet executable that you previously built using the make command. The
./ at the beginning specifies that the executable is in the current directory.
detector test: This is the command used to perform object detection with Darknet. It specifies that you want to run a
detection test.
cfg/coco.data: The path to the data configuration file. In this case, it's configured to use the COCO dataset.
cfg/yolov4.cfg: The path to the YOLOv4 model configuration file. This file contains information about the
architecture and settings of the YOLOv4 model.
yolov4.weights: The path to the YOLOv4 model weights file. This file contains the learned parameters of the YOLOv4
model.
data/object3.jpg: The path to the test image on which you want to perform object detection. When you run
this command, Darknet will use the YOLOv4 model to detect objects in the specified image. Detected objects will be
outlined and labeled in the output image. The result of the detection, including bounding boxes and class labels, will
be displayed in the terminal or the Colab notebook, depending on where you're running the code.
0 Create CUDA-stream - 0
Create cudnn-handle 0
BF
6 conv 64 3 x 3/ 1 304 x 304 x 32 -> 304 x 304 x64 3.407
BF
26 24 -> 76 x 76 x 256
route
27 conv 128 1 x 1/ 1 76 x 76 x 256 -> 76 x 76 x 128 0.379
BF
The code imShow('predictions.jpg') is using the previously defined imShow() function to display an image named
"predictions.jpg." This
image is likely generated as a result of running the object detection with Darknet, and it's expected to show the
detected objects with bounding boxes and labels.
This code first moves up one directory ( %cd .. ), which means it navigates out of the "darknet" directory. Then, it calls the
upload() function, which allows you to upload a new image from your computer into your current Google Colab environment.
Afterward, it navigates back to the "darknet" directory.
%cd ..: This command moves up one directory from the current working directory. It's used to navigate out of the
"darknet" directory.
#Upload new image from the computer using defined Upload object
%
/content
Upload widget is only available when the cell
executed in the current browser session. Please rerun this cell
to enable.
# run darknet with YOLOv4 on your personal image! (note yours will not be called highway.jpg so change the name)
0 Create CUDA-stream - 0
Create cudnn-handle 0
download('predictions.jpg')
DATASET:-
The MNIST dataset is a widely used dataset in the field of machine learning and computer vision. It is a
collection of 70,000 small, grayscale images of handwritten digits (0 through 9), each of size 28x28 pixels.
The dataset is often used as a benchmark in the development and testing of various machine learning
algorithms, particularly for image classification tasks.
Here are the key details of the MNIST dataset:
1. Size of the Dataset:
- The MNIST dataset consists of 70,000 images in total.
-This dataset is commonly divided into two subsets: a training set and a test set.
2. Training Set:
- The training set typically comprises 60,000 images.
- These images are used to train machine learning models. The model learns patterns and features from
this set.
3. Test Set:
- The test set contains the remaining 10,000 images.
- It is used to evaluate the performance of a trained model on unseen data. This helps assess how well the
model
generalizes to new, previously unseen examples.
4. Image Characteristics:
- Each image in the MNIST dataset is grayscale, meaning it has only one channel (as opposed to RGB
images,
which have three channels for red, green, and blue).
- The images are 28x28 pixels in size, resulting in a total of 784 pixels per image.
5.Labeling:
- Each image is associated with a label indicating the digit it represents (0 through 9). For instance, an
image of
the digit "5" will have a label of 5.
6. Usage in Machine Learning:
- The MNIST dataset is commonly used for training and testing machine learning models, especially for
tasks
related to image classification.
- It serves as a standard benchmark to compare the performance of different algorithms.
7. Challenges:
- While MNIST has been instrumental in the development of many image classification techniques, it is
considered a relatively simple dataset compared to real-world scenarios. Some modern models may
achieve near- perfect accuracy on MNIST, but this does not necessarily guarantee success on more complex
datasets.
8. Availability:
- The MNIST dataset is freely available and can be accessed from various machine learning libraries and
Repositories. When working with MNIST, it's common to preprocess the data, normalize the pixel values,
and use techniques like convolutional neural networks (CNNs) for effective feature extraction from the
images.
Source Code:
Designing a deep learning network for Robust Bi-Tempered Logistic Loss involves creating a neural network
architecture that works effectively with this specialized loss function. Here's aconcise breakdown:
Understanding the Loss Function:
Bi-Tempered Logistic Loss is a robust loss function used in classification tasks,
especially in scenarios where the data might be noisy or mislabeled. It introducestemperature parameters
and a scaling factor to control the impact of different classes on the loss.
Network Architecture:
Choose an appropriate architecture (e.g., CNN for images) considering thecomplexity and nature of the
data.
Include normalization layers and techniques to prevent overfitting.
Custom Loss Function:
Implement the Bi-Tempered Logistic Loss as a custom loss function inTensorFlow/Keras.
Define the function with parameters (t1, t2, c) and handle numerical stability.
Model Compilation:
Compile the model with an optimizer (e.g., Adam) using the custom Bi-TemperedLogistic Loss function.
Include relevant metrics for monitoring performance (e.g., accuracy).
Training and Evaluation:
Train the model on the training data and monitor performance on validation data.Evaluate the model's
performance using metrics (loss, accuracy) on unseen test data.
Hyperparameter Tuning:
Experiment with different values for temperature parameters (t1, t2, c) to enhance themodel's
performance.
Considerations:
Ensure data quality, adjust model complexity, and apply regularization techniques.Document experiments
and track model performance for iterative improvements.
Deployment:
Save the trained model and consider optimization techniques for deployment inapplications.
The focus is on crafting a neural network that effectively learns from data using the Bi-TemperedLogistic
Loss, ensuring robustness in the face of noisy or mislabeled data while achieving goodperformance.
import tensorflow as tf
!pip install -U tensorflow-addons
Requirement already satisfied: tensorflow-addons in /usr/local/lib/python3.10/dist-paRequirement already
satisfied: packaging in /usr/local/lib/python3.10/dist-packages (Requirement already satisfied:
typeguard<3.0.0,>=2.7 in /usr/local/lib/python3.10/dis
The provided code accomplishes the following tasks:
Imports: It imports necessary modules from TensorFlow/Keras to build and train neuralnetworks, including
components for handling the MNIST dataset.
Bi-Tempered Logistic Loss Function: Defines a custom loss function named bi_tempered_logistic_loss that
implements the Bi-Tempered Logistic Loss equation,addressing noisy or mislabeled data by modeling class
distribution flexibly.
Loading and Preprocessing MNIST Dataset: Loads the MNIST dataset, normalizes pixel values to a range
between 0 and 1, reshapes the data to comply with a CNN's input shape,and one-hot encodes the labels.
Building the CNN Model: Constructs a Sequential model using Convolutional NeuralNetwork (CNN) layers:
Two sets of Convolutional layers followed by MaxPooling layers to extract features.Flatten layer to prepare for the
Dense layers.
Epoch 6/10
1875/1875 [==============================] - 55s 29ms/step - loss: -1.0307 - val_loss
Epoch 7/10
1875/1875 [==============================] - 55s 29ms/step - loss: -1.0326 - val_loss
Epoch 8/10
1875/1875 [==============================] - 55s 29ms/step - loss: -1.0335 - val_loss
Epoch 9/10
1875/1875 [==============================] - 59s 31ms/step - loss: -1.0338 - val_loss
Epoch 10/10
1875/1875 [==============================] - 55s 29ms/step - loss: -1.0359 - val_loss
Test loss: -1.026029348373413
Number of Epochs:
The first code snippet trains the model for 10 epochs, while the second snippet trains for only 5epochs.
Output:
The output provided in both cases displays the training progress for each epoch, showing theloss values at
each epoch. The final line prints the test loss value calculated after the trainingcompletes. Training Time:
Training times might differ due to the variance in the number of epochs. The exact times for
each epoch vary between the two executions but should be similar given similar hardware anddataset sizes.
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Lambdafrom
tensorflow.keras.models import Sequential
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Bi-tempered logistic Loss function
def bi_tempered_logistic_loss(y_true, y_pred, t1, t2, c, epsilon=1e-7):y_pred = tf.clip_by_value(y_pred,
epsilon, 1 - epsilon)
temp1 = (tf.math.pow(1 - y_pred, t1) - 1) / t1temp2 = (tf.math.pow(1 - y_pred, t2) - 1) / t2
loss = y_true * (temp1 + temp2) + c * (temp1 * temp2)return tf.reduce_sum(loss, axis=-1)
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0 # Normalize pixel values
# Reshape the data to have a single channel (grayscale)
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
# One-hot encode the Labels
y_train = to_categorical(y_train, 10)y_test = to_categorical(y_test, 10)
# Build a CNN model with Bi-Tempered Logistic Lossmodel = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
Conv2D(64, kernel_size=(3, 3), activation='relu'),MaxPooling2D(pool_size=(2, 2)), Flatten(), Dense(128,
activation='relu'), Dense(10, activation='linear'),
# Use Linear activation for Logits
Lambda(lambda x: tf.math.l2_normalize(x, axis=1)) # Normalize the Logits
])
# Compile the model with Bi-Tempered Logistic Losst1 = 0.8
t2 = 1.2
c = 1.0
model.compile(optimizer='adam', loss=lambda y_true, y_pred: bi_tempered_logistic_loss(y_t
# Train the model
model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))# Evaluate the model
# Evaluate the model
test_loss = model.evaluate(X_test, y_test, verbose=0)print(f"Test loss: {test_loss}")
Epoch 1/5
1875/1875 [==============================] - 56s 29ms/step - loss: -0.9596 - val_loss
Epoch 2/5
1875/1875 [==============================] - 59s 31ms/step - loss: -1.0175 - val_loss
Epoch 3/5
1875/1875 [==============================] - 57s 30ms/step - loss: -1.0250 - val_loss
Epoch 4/5
1875/1875 [==============================] - 55s 29ms/step - loss: -1.0277 - val_loss
Epoch 5/5
1875/1875 [==============================] - 57s 31ms/step - loss: -1.0311 - val_loss
Test loss: -1.026908040046692
The primary difference lies in the optimizer used during compilation. The rest of the code is nearly identical
to the previous snippet. rmsprop is another optimizer commonly used in neuralnetwork training, but its
behavior and convergence might differ from adam. The output and
behavior of the model could vary due to this change in optimization algorithms.
If you want to delve deeper into the optimizer differences, adam generally adapts the learning rate during
training, while rmsprop also adapts the learning rate but in a slightly different manner.The performance and
convergence of the model might be affected by this change, potentially leading to varied results in training
dynamics and final accuracy.
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Lambdafrom
tensorflow.keras.models import Sequential
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Bi-tempered logistic Loss function
def bi_tempered_logistic_loss(y_true, y_pred, t1, t2, c, epsilon=1e-7):y_pred = tf.clip_by_value(y_pred,
epsilon, 1 - epsilon)
temp1 = (tf.math.pow(1 - y_pred, t1) - 1) / t1temp2 = (tf.math.pow(1 - y_pred, t2) - 1) / t2
loss = y_true * (temp1 + temp2) + c * (temp1 * temp2)return tf.reduce_sum(loss, axis=-1)
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0 # Normalize pixel values
# Reshape the data to have a single channel (grayscale)X_train = X_train.reshape(X_train.shape[0], 28, 28,
1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
# One-hot encode the Labels
y_train = to_categorical(y_train, 10)y_test = to_categorical(y_test, 10)
# Build a CNN model with Bi-Tempered Logistic Lossmodel = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
EXPERIMENT - 8
AIM : Build AlexNet using Advanced CNN.
Description :
Introduction
AlexNet is a groundbreaking convolutional neural network (CNN) architecture that achieved remarkable
success in the ImageNet Large Scale Visual Recognition Challenge in 2012. It played a pivotal role in
popularizing deep learning for image classification tasks. In this document, we'll delve into the process of
building an AlexNet-inspired model using advanced CNN techniques.
Key Components of AlexNet
Convolutional Layers
1. 1st Convolutional Layer : 96 filters, 11x11 kernel size, 4x4 strides, ReLU activation.
2. Max Pooling : 3x3 pool size, 2x2 strides.
3. 2nd Convolutional Layer : 256 filters, 5x5 kernel size, 1x1 strides, ReLU activation.
4. Max Pooling : 3x3 pool size, 2x2 strides.
5. 3rd Convolutional Layer : 384 filters, 3x3 kernel size, 1x1 strides, ReLU activation.
6. 4th Convolutional Layer : 384 filters, 3x3 kernel size, 1x1 strides, ReLU activation.
7. 5th Convolutional Layer : 256 filters, 3x3 kernel size, 1x1 strides, ReLU activation.
8. Max Pooling : 3x3 pool size, 2x2 strides.
Model Compilation
• Loss Function: Categorical Cross entropy
• Optimizer: Adam
• Metrics: Accuracy
DATASET:
MNIST Dataset Overview : The MNIST dataset is a widely used collection of handwritten digit images,
commonly employed as a benchmark in the field of machine learning and computer vision. Here's a
comprehensive description of the MNIST dataset:
Overview:
• Name: MNIST (Modified National Institute of Standards and Technology)
• Nature: Image Classification
• Dataset Size:
• Training Set: 60,000 images
• Test Set: 10,000 images
• Image Size: 28x28 pixels
• Classes: 10 (Digits 0 through 9)
• Source: Originally created by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for NIST,
modified for machine learning experiments.
Characteristics:
1.Image Content: Each image in the MNIST dataset is a grayscale image of a handwritten digit.
2.Labeling : Every image is associated with a label indicating the digit it represents (0 through 9).
3.Digit Variety : The dataset includes a diverse set of digits, capturing variations in writing styles.
4.Size Consistency : All images are resized to a standard size of 28x28 pixels, providing consistency for
machine learning models.
5.Gray Scale: Images are in grayscale, with pixel values ranging from 0 to 255
Source Code:
This code block imports various Python libraries and modules commonly used in machine learning, particularly for
working with image data and building neural networks using the Keras framework. Here's a brief description of each:
numpy: A library for numerical operations, particularly useful for handling arrays and matrices.
pandas: A data manipulation library that provides data structures like DataFrames, making it easy to work with
structured data.
keras: A high-level deep learning library. It provides an easy-to-use interface for building and training neural networks.
img_to_array, array_to_img: Functions for converting images to arrays and vice versa.
train_test_split: Function from scikit-learn for splitting datasets into training and testing sets.
Various layers (Dense, Dropout, Flatten, Input, Conv2D, MaxPooling2D, AveragePooling2D, BatchNormalization):
Different types of layers used in building neural networks.
classification_report: A function from scikit-learn for generating a classification report, which includes precision,
recall, and F1-score.
Importing the Model class from Keras, which allows building more complex neural network architectures than the
simple sequential model.
In this section of the code, the MNIST dataset is being loaded using Keras. MNIST is a well-known dataset of
handwritten digits, commonly used for training and testing machine learning models
This line loads the MNIST dataset using Keras. The dataset is split into training and testing sets, with images and
corresponding labels.
y_train and y_test contain the corresponding labels (the digit each image represents).
This line creates a new variable y_true and assigns it the values of y_test. y_true is commonly used in machine
learning contexts to represent the true (actual) labels.
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()y_true = y_test
This code block is preparing the MNIST dataset for input into a neural network by reshaping the images. Here's an
explanation of the code:
These lines define the dimensions of the images in the MNIST dataset. Each image is 28 pixels in height and 28 pixels
in width, forming a 28x28 pixel grid. The input_shape variable is set to (28, 28, 1), indicating the dimensions of each
image with a single channel (grayscale).
These lines reshape the image arrays (x_train and x_test) to match the specified input_shape. The reshape function
is used to rearrange the dimensions of the arrays. The new shape is set to (number of samples, height, width,
channels). In this case, it's (number of samples, 28, 28, 1) to match the expected input shape for a Convolutional
Neural Network (CNN).
These lines print the shapes of the training and testing sets after the reshaping process. It helps to verify that the
reshaping was successful and that the input shapes match the expected format for a CNN. The shapes of x_train and
x_test should now be (number of samples, 28, 28, 1), and the shapes of y_train and y_test represent the labels for
the corresponding sets.
This preparation is crucial for feeding the data into a convolutional neural network (CNN) since CNNs expect input
data in a specific format, especially when working with image data
print("Train set shape", x_train.shape, 'trainlabel shape', y_train.shape)print('test set shape', x_test.shape, 'test
labels:', y_test.shape)
This code block is splitting the original training set (x_train and y_train) into a new training set (x_train and y_train), a
validation set (x_val and
y_val), and a test set (x_test and y_test). Here's a breakdown of the code:
This line uses the train_test_split function from scikit-learn to split the original training set (x_test and y_test) into a
new training set (x_train and y_train) and a validation set (x_val and y_val). The test_size parameter is set to 0.2,
indicating that 20% of the data will be used for validation.
This line further splits the original training set (x_train and y_train) into a smaller test set (x_test and y_test) by
selecting the first 5000 samples. This smaller test set is likely used for quick testing or validation purposes.
print('X_train shape:', x_train.shape, 'X_label shape:', y_train.shape) print('Val_set shape:', x_val.shape, 'val_label
shape:', y_val.shape) print('Test_set shape:', x_test.shape, 'y_test shape:', y_test.shape)
These lines print the shapes of the newly created training, validation, and test sets. It's a useful step to verify that the
data has been split
correctly, and the shapes match the expectations. The shapes printed indicate the number of samples and the
dimensions of each sample for the respective sets.
y_test=y_train[:5000]
This code block is performing normalization on the pixel values of the images in the training, validation, and test sets.
Normalization is a common preprocessing step in machine learning, especially for image data. Here's an explanation
of the code:
For each set (x_train, x_val, and x_test), the pixel values are normalized. The normalization process involves
subtracting the mean and dividing by the standard deviation.
x_train.mean() and x_train.std() calculate the mean and standard deviation of the pixel values in the training set.
Similarly, x_val.mean() and x_val.std() are used for the validation set, and x_test.mean() and x_test.std() bold text
for the test set.
The result is that each set is transformed so that its pixel values have a mean of approximately 0 and a standard
deviation of approximately 1. Normalizing the data helps in training neural networks by ensuring that the features
(pixel values in this case) are on a similar scale, which can lead to better convergence during training.
This normalization process is a common practice in machine learning to improve the stability and performance of the
training process.
# normalization of data
This code block appears to be preparing the data for a neural network model, specifically for image classification.
Here's an overall description:
num_labels = 10
This line sets the variable num_labels to 10, suggesting that the dataset involves classifying images into 10 different
categories.
im_row = 227
im_col = 227
These lines set the dimensions (im_row and im_col) to 227x227 pixels, indicating the desired size for the images in the
dataset.
def reformat(dataset):
This function, reformat, takes a dataset of images and applies the following operations:
Uses array_to_img and img_to_array from Keras to convert the array back to an image and then resizes it to the
specified dimensions (227x227 pixels). The result is a reformatted dataset of images.
The training set labels (y_train) are one-hot encoded using to_categorical from Keras, converting them into binary
vectors.
The training set images (x_train) are then reformatted using the reformat function. The shapes of the reformatted
training set and its labels are printed.
print('test set shape:', x_test.shape, 'test label shape', y_test.shape) Similar operations are performed for the test set
(x_test and y_test). y_val = keras.utils.to_categorical(y_val)
x_val = reformat(x_val)
Lastly, the same operations are applied to the validation set (x_val and y_val).
num_labels = 10
im_col = 227
def reformat(dataset):
test set shape: (5000, 227, 227, 1) test label shape (5000, 10)
val set shape: (2000, 227, 227, 1) val_lavels shape: (2000, 10)
AlexNet Architecture
This code defines the architecture of the AlexNet model using the Keras framework. AlexNet is a well-known deep
convolutional neural network architecture designed for image classification tasks. Here's a breakdown of the code:
batch_size = 32
num_classes = 10
epochs = 50
These variables define parameters for training the model, such as the batch size, the number of classes, and the
number of epochs (iterations through the entire dataset during training).
model = Sequential()
This line initializes a sequential model, which is a linear stack of layers in Keras.
The first convolutional layer with 96 filters, an input shape of (227, 227, 1) (representing grayscale images), a kernel
size of (11, 11), and a stride of (4, 4). ReLU (Rectified Linear Unit) is used as the activation function.
Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
Max pooling layer following the first convolutional layer with a pool size of (3, 3) and a stride of (2, 2). The subsequent
layers follow a similar pattern:
Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.4))
2nd Fully Connected Layer
model.add(Dense(4096, activation='relu'))
Add Dropout
model.add(Dropout(0.4))
model.add(Dense(1000, activation='relu'))
Add Dropout
model.add(Dropout(0.4))
Finally, the output layer with the number of classes defined earlier:
Output Layer
model.add(Dense(num_classes, activation='softmax'))
The model is compiled using categorical cross-entropy as the loss function, the Adam optimizer, and accuracy as the
evaluation metric:
This code defines the architecture of the AlexNet model and prepares it for training on a dataset with 10 classes.
batch_size = 32
num_classes = 10
epochs = 50
model = Sequential()
# Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
# Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
# Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
# Add Dropout
model.add(Dropout(0.4))
model.add(Dense(1000, activation='relu'))
# Add Dropout
model.add(Dropout(0.4))
# Output Layer
model.add(Dense(num_classes, activation='softmax'))
The model.summary() method provides a concise summary of the architecture and parameters of the defined neural
network
model.summary()
Model: "sequential"
====================================================
=================================================================
This line of code is using the fit method to train the AlexNet model on the provided training data (x_train and y_train).
Here's a breakdown of the parameters:
batch_size: The number of samples per gradient update. In this case, it's set to 32, meaning the model will be updated
after processing each batch of 32 samples.
epochs: The number of epochs (iterations over the entire training dataset) for training the model. In this case, it's set
to 50.
verbose: Controls the amount of information printed during training. A value of 1 prints progress bar and loss
information for each epoch.
validation_data: A tuple containing validation data to be used during training. In this case, it's specified as (x_val,
y_val).
The fit method trains the model on the provided data, updates the model parameters, and returns a history object
(hist) that contains information about the training process, such as the training and validation loss and accuracy for
each epoch
This code block is evaluating the trained AlexNet model on the test dataset (x_test and y_test). Here's a breakdown of
the code:
The evaluate method is used to evaluate the model on the test data. It returns a list containing the test loss and test
accuracy. The verbose=1
These lines print the test loss and test accuracy obtained from the evaluation. The values are extracted from the score
list.
After executing this code block, you will get printed output indicating the test loss and accuracy of the trained model
on the test dataset. The test accuracy represents the percentage of correctly classified samples in the test set, while
the test loss is a measure of the model's
performance on the test data.
This code block is using Matplotlib to visualize the training and validation accuracy as well as the training and
validation loss over the epochs. Here's a breakdown of the code:
These lines extract the training and validation accuracy, as well as the training and validation loss, from the training
history (hist) obtained during the model training.
plt.legend() plt.figure()
The first plt.plot line creates a plot of training accuracy with blue dots, and the second line creates a plot of validation
accuracy with a solid blue line. The plt.title function adds a title to the accuracy plot, and plt.legend adds a legend to
distinguish between training and validation accuracy.
plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training
and validation loss') plt.legend() plt.show()
Similarly, these lines create two plots: one for training loss and another for validation loss. The blue dots represent
training loss, and the solid blue line represents validation loss. Titles and legends are added to make the plots more
informative.
After executing this code block, it will generate two plots showing the training and validation accuracy and loss over
the epochs. These visualizations can help you assess the model's performance during training, identify overfitting or
underfitting, and make decisions on whether further adjustments are needed.
accuracy = hist.history['acc']
plt.legend()plt.figure()
This code block is making predictions on the test data using the trained AlexNet model and then separating the indices
of correctly and incorrectly classified samples. Here's a breakdown of the code:
predicted_classes = model.predict_classes(x_test)
The predict_classes method is used to obtain the predicted class labels for the test data (x_test). This generates an
array of predicted classes for each sample in the test set.
The np.nonzero function is used to find the indices where the predicted classes match the true labels (correct) and
where they do not match (incorrect). This helps in separating the indices of correctly and incorrectly classified samples.
After executing this code block, the correct and incorrect arrays contain the indices of samples in the test set that were
correctly and incorrectly classified, respectively. These indices can be useful for further analysis or visualization, such as
inspecting specific images to understand model performance.
predicted_classes = model.predict_classes(x_test)
correct = np.nonzero(predicted_classes==y_true)[0]
incorrect = np.nonzero(predicted_classes!=y_true)[0]
This code block is using scikit-learn's classification_report function to generate a detailed classification report, which
includes precision, recall, and F1-score, for the predictions made by the AlexNet model on the test data. Here's a
breakdown of the code:
This line creates a list of class names for the classification report. The list is based on the number of classes
(num_classes), with each class represented as "Class 0", "Class 1", ..., "Class 9".
print(classification_report(y_true, predicted_classes, target_names=target_names))
The classification_report function is used to generate a detailed classification report. It takes the true labels (y_true) and
predicted labels (predicted_classes) as inputs. The target_names parameter is set to the list of class names created
earlier.
The printed classification report provides metrics such as precision, recall, and F1-score for each class, as well as overall
performance metrics. It's a valuable tool for assessing the model's performance on individual classes and gaining insights
into its strengths and weaknesses across different categories.
This code block is using Matplotlib to create a visual representation of correctly classified images from the test set. It
plots a 3x3 grid of images with their predicted and true class labels. Here's a breakdown of the code:
for i, c in enumerate(correct[:9]):
plt.subplot(3,3,i+1)
The for loop iterates over the indices of correctly classified samples (correct[:9]) and uses enumerate to get both the
index (i) and the corresponding index value (c).
plt.subplot(3,3,i+1)sets up a subplot in a 3x3 grid, and plt.imshow displays the image using a grayscale colormap
(cmap='gray'). The title of each subplot includes the predicted class and the true class labels for better understanding.
plt.tight_layout() is used to improve the layout spacing. After executing this code block, you should see a 3x3 grid of
correctly classified images from the test set, with each subplot displaying an image along with its predicted and true
class labels.
for i, c in enumerate(correct[:9]):plt.subplot(3,3,i+1)
plt.imshow(x_test[c].reshape(227,227), cmap='gray', interpolation='none') plt.title("Predicted {}, Class
{}".format(predicted_classes[c], y_true[c]))plt.tight_layout()
This code block is using Matplotlib to create a visual representation of incorrectly classified images from the test set. It
plots a 3x3 grid of images with their predicted and true class labels. Here's a breakdown of the code:
plt.subplot(3,3,i+1)
The for loop iterates over the indices of incorrectly classified samples (incorrect[:9]) and uses enumerate to get both the
index (i) and the corresponding index value (c).
plt.subplot(3,3,i+1)sets up a subplot in a 3x3 grid, and plt.imshow displays the image using a grayscale colormap
(cmap='gray'). The title of each subplot includes the predicted class and the true class labels for better understanding.
plt.tight_layout() is used to improve the layout spacing. After executing this code block, you should see a 3x3 grid of
incorrectly classified images from the test set, with each subplot displaying an image along with its predicted and true
class labels.
This code block displays a single image from the training set. The image is reshaped and shown using Matplotlib. Here's a
breakdown of the code:
EXPERIMENT – 9
AIM : Demonstration of Application of Autoencoders
Description:
AUTOENCODERS:
Autoencoders are the data encoding techniques based on Unsupervised Artificial Neural Networks. This special
type of ANN is trained to encode the data so that in such a way that data is represented in compressed form.
The Autoencoders are also trained to decode the data so that, the original data can be reconstructed as far as
possible.
Architecture of Autoencoders:
The architecture for autoencoders are varied. In this section LSTM autoencoders is discussed. LSTM based
autoencoders are used to encode and decode the sequence data.
So, the building a predictive model to predict the sequence data involve sequence of operation and hence such
problems are called as Sequence-to Sequence. Autoencoders comes as the best choice to handle sequence-to-
sequence problems.
Overview:
• Name: MNIST (Modified National Institute of Standards and Technology)
• Nature: Image Classification
• Dataset Size:
• Training Set: 60,000 images
• Test Set: 10,000 images
• Image Size: 28x28 pixels
• Classes: 10 (Digits 0 through 9)
• Source: Originally created by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for NIST, modified
for machine learning experiments.
Characteristics:
1.Image Content: Each image in the MNIST dataset is a grayscale image of a handwritten digit.
2.Labeling : Every image is associated with a label indicating the digit it represents (0 through 9).
3.Digit Variety : The dataset includes a diverse set of digits, capturing variations in writing styles.
4.Size Consistency : All images are resized to a standard size of 28x28 pixels, providing consistency for machine
learning models.
5.Gray Scale: Images are in grayscale, with pixel values ranging from 0 to 255
Source Code:
Purpose:
It creates an LSTM autoencoder model to recreate a given input sequence. It's designed forsequential
data, where order matters (like time series or text).
Key Steps:
Step 1:
Import libraries:
numpy for array operations
STEP 2:
STEP 3:
Reshape input:
Converts the sequence into a 3D shape suitable for LSTM (samples, timesteps, features).
STEP 4:
LSTM layer (100 units, relu activation) to encode the input sequence.
RepeatVector layer to repeat the encoded representation for output generation.LSTM
STEP 5:
STEP 6:
Fits the model to the input sequence, aiming to learn a representation that can recreate it.Trains for
len(sequence)
Like reconstruction, autoencoders can be used to predict the sequence, the code is as givenbelow:
Purpose:
It creates an LSTM autoencoder model to predict the next values in a given sequence. It learnspatterns in
sequential data to anticipate future values.
Key Steps:
STEP 1:
Import libraries:
numpy for array operations
STEP 2:
STEP 3:
Reshape input:
Converts the sequence into a 3D shape suitable for LSTM (samples, timesteps, features).
STEP 4:
STEP 5:
STEP 7:
STEP 8:
STEP 9:
Let’s train the autoencoders using MNIST data set using simple Feed Forward neural network. Code:
Once the autoencoders is trained on MNIST data set, an anomaly detection can be done using 2different
images. First one of the images from the MNIST data set is chosen and feed to the
trained autoencoders. Since, this image is not an anomaly, the error or loss function is expectedto be very
low. Next, when some random image is given as test image, the loss rate is expected to be very high as it is
an anomaly.
Simple 6 layered Autoencoders build to train on MNIST data
Purpose:
Key Steps:
STEP 1:
Import libraries:
keras for building and training the autoencoder model
STEP 2:
STEP 3:
Encoder:
Dense layer (512 units, Relu activation)
Decoder:
Dense layer (128 units, elu activation)
STEP 4:
STEP 5:
STEP 6:
Creates a separate model that takes an input and outputs the bottleneck representation.
Generate reconstructions:
Uses the full autoencoder model to reconstruct the original images from the compressed
representations.
encoded_data = encoder.predict(train_x)
# bottleneck representation
decoded_output = autoencoder.predict(train_x)
# reconstructionencoding_dim = 10
encoded_input = Input(shape=(encoding_dim,))
decoder = autoencoder.layers[-3](encoded_input)
decoder = autoencoder.layers[-2](decoder)
decoder = autoencoder.layers[-1](decoder)
decoder = Model(encoded_input, decoder)
drive.mount('/content/drive',force_remount=True)Mounted at /content/drive
Purpose:
It tests the performance of the trained autoencoder model on an image that's not from the
MNIST dataset. It measures the reconstruction error to assess how well the model generalizesto unseen
data.
Key Steps:
STEP 1:
Converts it to grayscale.
STEP 2:
STEP 3:
Uses the trained autoencoder model to reconstruct the image from its compressed
representation.
STEP 4:
This measures how much information was lost during compression and reconstruction.
STEP 5:
# %matplotlib inline
from keras.preprocessing import image
# if the img.png is not one of the MNIST dataset that the model was trained on, the error img =
image.load_img("/content/drive/MyDrive/DLT/don-joshuva.jpeg", target_size=(28, 28), input_img =
image.img_to_array(img)
inputs = input_img.reshape(1,784)
target_data = autoencoder.predict(inputs)
dist = np.linalg.norm(inputs - target_data, axis=-1)print(dist)
DESCRIPTION:
GAN:
Generative Adversarial Networks, or GANs, represent a cutting-edge approach to generative modeling within
deep learning, often leveraging architectures like convolutional neural networks. The goal of generative
modeling is to autonomously identify patterns in input data, enabling the model to produce new examples
that feasibly resemble the original dataset.
GANs tackle this challenge through a unique setup, treating it as a supervised learning problem involving two
key components: the generator, which learns to produce novel examples, and the discriminator, tasked with
distinguishing between genuine and generated instances. Through adversarial training, these models engage
in a competitive interplay until the generator becomes adept at creating realistic samples, fooling the
discriminator approximately half the time.
This dynamic field of GANs has rapidly evolved, showcasing remarkable capabilities in generating lifelike
content across various domains. Notable applications include image-to-image translation tasks and the
creation of photorealistic images indistinguishable from real photos, demonstrating the transformative
potential of GANs in the realm of generative modeling.
ARCHITECTURE:
The architecture of the deep learning GAN models consists of two modules
namely generator and Discriminator.
1. A generator model:
• It is the learning component of a GAN models.
• It learns to generate the new data by incorporating the feedback received from the discriminator.
• It learns to allow the discriminator to classify its newly generated data as real.
• Hence, training the Generator also requires the discriminator to be considered.
2. A Discriminator models:
• It is a classifier in GANs
• Its job is to classify the output of the generator (newly generated data) from the real one.
GAN ARCHITECTURE
DATASET:
MNIST Dataset Overview : The MNIST dataset is a widely used collection of handwritten digit images,
commonly employed as a benchmark in the field of machine learning and computer vision. Here's a
comprehensive description of the MNIST dataset:
Overview:
• Name: MNIST (Modified National Institute of Standards and Technology)
• Nature: Image Classification
• Dataset Size:
• Training Set: 60,000 images
• Test Set: 10,000 images
• Image Size: 28x28 pixels
• Classes: 10 (Digits 0 through 9)
• Source: Originally created by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges for NIST, modified
for machine learning experiments.
Source Code:
Importing Libraries
import tensorflow as tf
TensorFlow is imported for building and training neural networks. Specific layers and models from the Keras API (which is
integrated into TensorFlow) are imported. NumPy is imported for numerical operations. Matplotlib is imported for
plotting graphs and visualizations.
The MNIST dataset is loaded. It contains 28x28 pixel grayscale images of handwritten digits (0-9). The images are
normalized to the range [0, 1] by dividing by 255.
# Generator model
generator = Sequential([
Reshape((28, 28))
])
The generator and discriminator models are defined using the Sequential API. The generator takes a random noise vector
of size 100, passes it through a dense layer, reshapes it to the size of an image (28x28). The discriminator takes an image,
flattens it, passes it through dense layers, and produces a binary output indicating whether the input is real or
generated.
# Discriminator model
discriminator = Sequential([
])
The discriminator is compiled with binary cross-entropy loss and Adam optimizer. The GAN is created by combining the
generator and discriminator. Discriminator's weights are frozen during GAN training. The GAN is compiled with binary
cross-entropy loss and Adam optimizer.
gan_output = discriminator(x)
gan.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.0002))
for e in range(epochs):
for _ in range(batch_count):
discriminator.trainable = True
plot_generated_images(e, generator)
4/4 [===========
generated_images = generator.predict(noise)
plt.tight_layout()
plt.savefig(f'gan_generated_image_epoch_{epoch}.png')plt.show()
train_gan(100, 128)
As training progresses, you should observe a decrease in the D Loss and an increase in the G Loss. This indicates that the
discriminator is getting better at distinguishing real from generated data, and the generator is improving at generating
realistic data.
If the D Loss becomes very low (close to zero), it might indicate that the discriminator is too strong or that the generator
is not effective at fooling it. You may need to adjust the architecture or training parameters.
If the G Loss becomes very high, it might indicate that the generator is not making significant progress in generating
realistic data. You might need to adjust the architecture, hyperparameters, or training strategy.
The visualizations of generated images can provide a qualitative assessment of the quality of generated data. You should
look for clear and recognizable patterns in the generated images as training progresses.
It's important to note that GAN training can be complex, and interpreting the output requires a balance between
quantitative measures (loss values) and qualitative inspection of generated samples. You may need to experiment with
different hyperparameters and network architectures to achieve the desired results.
EXERCISE – 11
AIM: TRAFFIC-SIGN RECOGNITION SYSTEM
Description:
Problem statement:
The problem is to develop a Traffic-sign recognition system which can recognize the traffic signs put up on the
road e.g. "speed limit" or "children" or "turn ahead" etc. Given the traffic signs in the image form as input, the
problem is to recognize the signs using Machine learning techniques. To solve the problem following are
provided:
A huge collection of traffic signal taken under different scene is available as input. These signs may be not
clearly visible, are challenging to process as they are taken from far
Separate set of images are for testing the model is available
Use the available data and develop a traffic sign recognition system which can categorize
signs i.e, classify to which class the traffic sign belongs to.
Project implementation: Traffic signs are of different types like speed limits, traffic signal, turn left or right
etc. Traffics recognition problem can be considered as traffic sign classification problem. Since, the traffic signs
might have been captured from far, the model that we build should be able to detect accurately.
We use deep learning technique which can extract the features accurately and predict the sign class. The sign
detection methods are based on the features like colour, shape. To extract the features from the complex
images, a deep learning technique - Convolutional Neural network and image processing techniques are used.
A Convolutional Neural Network (CNN): It is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data. Convolutional Neural Network consists of multiple layers
like the input layer, Convolutional layer, Pooling layer, and fully connected layers. The Convolutional layer
applies filters to the input image to extract features, the Pooling layer down samples the image to reduce
computation, and the fully connected layer makes the final prediction. The network learns the optimal filters
through backpropagation and gradient descent.
-Dataset used in the project:
To implement this project, traffic sign data set is used. This data set can be downloaded from Kaggle. The data
set used here is from German Traffic Sign Benchmark. Data set description is as follows:
Before implementing this project ensure, the following necessary packages are installed:
First download the data set and extract the files into a directory. The extracted data set contains 3 folder and 3
.csv file. Meta, Train and Test folder contains the images for target class, training and testing respectively.
Meta, Train and Test .csv files contains paths for images, image-ID and other information.
The method used to build and evaluate the model for traffic sign categorization is a given below:
Source code:
Exploring the Dataset
Step1: Import the necessary files
!pip install opencv-python
24169 43 44 5 6 38 39 C:/Users/vanam/OneDrive/
16
17636 55 52 6 5 50 47 C:/Users/vanam/OneDrive/
11
18839 71 72 7 7 65 66 C:/Users/vanam/OneDrive/
12
19843 37 39 5 6 32 33 C:/Users/vanam/OneDrive/
12
14477 39 39 6 5 34 34 C:/Users/vanam/OneDrive/
9
metaDf.sample(10)
The data set may be imbalanced meaning that, the number of samples available for each class may not
same. So, it is advisable to analyze the class distribution so that training and validation of the model can
be implemented accordingly.
To find the data set imbalance, Histogram plot is used. From the below output, we can see that, this
class distribution for this data set is not uniform. Seaborn library is used to plot and visualize the
histogram. Also, we can observer that, the subset of train and test dataset almost have similar class
imbalance fig, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(25, 6)) axs[0].set_title('Train
classes distribution') axs[0].set_xlabel('Class') axs[0].set_ylabel('Count') axs[1].set_title('Test classes
Step6: Analyze the size distribution of images Since the accuracy of the sign recognition is dependent of the
quality of the images, it is requiring knowing the resolution of the images.
KDE Plot also called as Kernel Density Estimate is used for visualizing the PDF- Probability Density function
of a continuous variable. Seaborn is used to plot KDE plot. The below plot shoes the probability density
at different values in a continuous variable.
A multivariate plotting is used to visualize width and height of the images. This dataset contains
thousands of images but not with same resolution. we can see from below given output that, most of
images is rectangular, most of them are about 35x35 pixels resolution and few samples have high
resolution like a 100x100 pixels.
Step7: Visualize the target class The target class in this data set can be considered as a sample but as an
image indicating the sign. Some of the images may be different from the dataset samples so, few target
class samples are visualized
rows = 6
cols = 8
fig, axs = plt.subplots(rows, cols, sharex=True, sharey=True, figsize=(25, 12))
plt.subplots_adjust(left=None, bottom=None, right=None, top=0.9, wspace=None, hspace=None) metaDf =
metaDf.sort_values(by=['ClassId']) idx = 0
for i in range(rows):
break
img[np.where(img[:,:,3]==0)] = [255,255,255,255]
axs[i,j].imshow(img)
axs[i,j].set_facecolor('xkcd:salmon')
axs[i,j].set_title(labels[int(metaDf["ClassId"].tolist()[idx])]) axs[i,j].get_xaxis().set_visible(False)
axs[i,j].get_yaxis().set_visible(False) idx += 1
Step8: Visualize the Training set Few images of the training set are also visualized
rows = 10
cols = 10
fig, axs = plt.subplots(rows, cols, sharex=True, sharey=True, figsize=(25, 12))
print(cur_path) idx = 0
for i in range(rows):
for j in range(cols):
#print(path)
axs[i,j].imshow(img)
axs[i,j].set_title(labels[int(trainDf["ClassId"].tolist()[idx])]) axs[i,j].get_xaxis().set_visible(False)
axs[i,j].get_yaxis().set_visible(False) idx += 1
Model building using Convolutional Neural Network Keras is used for Model building. For presentation
purpose, Model building python code is presented separately. Code and the description are as follows:
Step1: Import the necessary file All the necessary files required for model building, data manipulation,
visualization, image processing must be imported.
Step3: Read the images and convert to numpy arrays We need to convert the list into numpy arrays for feeding
to the model.
Step5: Convert the label using one hot encoding From the <keras.utils> package, method is used to convert
the labels present in y_train and t_test into one-hot encoding.
#Converting the labels into one hot encoding y_train = to_categorical(y_train, 43)
y_test = to_categorical(y_test, 43)
Step6: Build the CNN model CNN model is used to Classify of the images into their corresponding categories.
The description of the CNN architecture used here is:
2 Conv2D layer (filter=32, kernel_size=(5,5), activation=”relu”) MaxPool2D layer ( pool_size=(2,2)) Dropout
layer (rate=0.25) 2 Conv2D layer (filter=64, kernel_size=(3,3), activation=”relu”) MaxPool2D layer (
pool_size=(2,2)) Dropout layer (rate=0.25) Flatten layer to squeeze the layers into 1 dimension Dense Fully
connected layer (256 nodes, activation=”relu”) Dropout layer (rate=0.5) Dense layer (43 nodes,
activation=”softmax”)
#Building the model model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu', input_shape=X_train.shape[1:]))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(rate=0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu')) model.add(Conv2D(filters=64,
kernel_size=(3, 3), activation='relu')) model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(rate=0.25)) model.add(Flatten())
model.add(Dense(256, activation='relu')) model.add(Dropout(rate=0.5))
model.add(Dense(43, activation='softmax'))
Step7: Compile the model The model needs to be compiled for the hyperparameters. We have used Adam
optimizer to compile the model which performs gives better accuracy and converges fast. Since the classes
are categorical, “categorical_crossentropy” is used as loss function with adam optimizer.
#Compilation of the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) epochs = 15
history = model.fit(X_train, y_train, batch_size=32, epochs=epochs, validation_data=(X_test, y_test))
Epoch 1/15
981/981 [==============================] - 88s 79ms/step - loss: 2.0051 - accuracy: 0.4934 - val_loss:
0.6057 - val_accuracy: 0.855 Epoch 2/15
981/981 [==============================] - 81s 82ms/step - loss: 0.7954 - accuracy: 0.7685 - val_loss:
0.2969 - val_accuracy: 0.927 Epoch 3/15
981/981 [==============================] - 80s 81ms/step - loss: 0.5348 - accuracy: 0.8429 - val_loss:
0.2080 - val_accuracy: 0.936 Epoch 4/15
981/981 [==============================] - 80s 81ms/step - loss: 0.4474 - accuracy: 0.8716 - val_loss:
0.1528 - val_accuracy: 0.961 Epoch 5/15
981/981 [==============================] - 82s 83ms/step - loss: 0.3850 - accuracy: 0.8864 - val_loss:
0.1238 - val_accuracy: 0.964
Epoch 6/15
981/981 [==============================] - 80s 82ms/step - loss: 0.3487 - accuracy: 0.8999 - val_loss:
0.1113 - val_accuracy: 0.967 Epoch 7/15
981/981 [==============================] - 82s 84ms/step - loss: 0.3119 - accuracy: 0.9100 - val_loss:
0.0806 - val_accuracy: 0.976 Epoch 8/15
981/981 [==============================] - 87s 88ms/step - loss: 0.2839 - accuracy: 0.9183 - val_loss:
0.0767 - val_accuracy: 0.977
Epoch 9/15
981/981 [==============================] - 82s 84ms/step - loss: 0.2678 - accuracy: 0.9216 - val_loss:
0.0806 - val_accuracy: 0.975 Epoch 10/15
981/981 [==============================] - 84s 86ms/step - loss: 0.2657 - accuracy: 0.9250 - val_loss:
0.0644 - val_accuracy: 0.981 Epoch 11/15
981/981 [==============================] - 83s 85ms/step - loss: 0.2656 - accuracy: 0.9253 - val_loss:
0.2162 - val_accuracy: 0.944 Epoch 12/15
981/981 [==============================] - 119s 122ms/step - loss: 0.2760 - accuracy: 0.9221 - val_loss:
0.0602 - val_accuracy: 0.9
Epoch 13/15
981/981 [==============================] - 115s 118ms/step - loss: 0.2144 - accuracy: 0.9393 - val_loss:
0.0689 - val_accuracy: 0.9 Epoch 14/15
981/981 [==============================] - 90s 92ms/step - loss: 0.2654 - accuracy: 0.9279 - val_loss:
0.0652 - val_accuracy: 0.979 Epoch 15/15
981/981 [==============================] - 82s 84ms/step - loss: 0.2442 - accuracy: 0.9338 - val_loss:
0.0781 - val_accuracy: 0.977
Step8: Train and validate the Model After building the CNN model with required hyperparameter, we use the
training images to train the model using model.fit(). The model was trained by varying the Batch size and epoch
size. Initially Epoch size=10 and batch size=50. Next both epoch and batch size was increased at epoch size=15
and batch size= 64, the accuracy was 92%. Further increase both the parameter there was no significant
improvement in accuracy and loss function.
Step9: Plot the accuracy and loss Graphs are plotted to visualize the epoch vs accuracy and epoch vs Loss
function
#plotting graphs for accuracy plt.figure(0)
plt.plot(history.history['accuracy'], label='training accuracy') plt.plot(history.history['val_accuracy'], label='val
accuracy') plt.title('Accuracy')
plt.xlabel('epochs')
plt.ylabel('accuracy') plt.legend()
plt.show()
plt.figure(1)
plt.plot(history.history['loss'], label='training loss') plt.plot(history.history['val_loss'], label='val loss')
plt.title('Loss')
plt.xlabel('epochs') plt.ylabel('loss') plt.legend()
plt.show()
Step10: Load the test data along with labels This dataset contains a Test folder: details related to the image
path and in a Test.csv file their respective class labels. We first extract the image path and labels using pandas.
Images is resized to 30×30 pixels. We use numpy array to store all image data.
#testing accuracy on test dataset cur_path = os.getcwd()
print(cur_path)
#Retrieving the images and their labels #for i in range(classes):
path = os.path.join(cur_path,'Traffic Sign Recognition\Test') images = os.listdir(path)
print(path)
print(images)
Step11: Predict the class Using the test images the class label is predicted. We import from the
<sklearn.metrics>, and observe the predicted value.
# Assuming a multi-class classification model
pred = np.argmax(model.predict(X_test), axis=1)
Convolutional Neural Network (CNN) Model: A sequential model is created using Keras, representing a
Convolutional Neural Network (CNN) for image classification. The model consists of convolutional layers with
ReLU activation, max-pooling layers, a flattening layer, and two dense
layers (fully connected) with ReLU activation for feature extraction and classification. The output layer has 10
neurons corresponding to the 10 classes in the Fashion-MNIST dataset, and no activation function is specified,
indicating a raw output used for classification
The model was trained for 10 epochs, achieving decreasing training and validation losses, as well as increasing
training and validation accuracies. The final accuracy on the validation set is around 91.25%, indicating effective
learning. However, it's important to monitor for signs of overfitting, where the model may perform well on the
training set but not generalize well to new data.
Generate Predictions: The model.predict method is employed to generate predictions on the test dataset
(test_images). The resulting predictions are then processed to obtain the corresponding class labels by finding
the index of the maximum value in each prediction using np.argmax. The predicted labels are stored in the
predicted_labels variable, providing the model's classification for the test dataset.
# Generate predictions on the test dataset
predictions = model.predict(test_images)
predicted_labels = [class_labels[np.argmax(prediction)] for prediction in predictions]
313/313 [==============================] - 4s 13ms/step
Visualize Sample Predictions: Matplotlib is used to create a 5x5 grid of subplots for visualizing a subset of the
test dataset along with their predicted labels. For each subplot, an image from test_images is displayed in
grayscale. The true class label is obtained from class_labels[test_labels[i]], and the predicted label is obtained
from the predicted_labels array. The title of each subplot shows the true and predicted labels for comparison,
providing a visual assessment of the model's performance on the test data.
# Print some sample images with predicted labelsplt.figure(figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i + 1)
predicted_label = predicted_labels[i]
Evaluate Model on Test Data: The model.evaluate method is used to assess the model's performance on
the test dataset (test_images and test_labels). The resulting test loss and accuracy are stored in the
variables test_loss and test_accuracy. The test accuracy is then printed to the console, providing a
quantitative measure of the model's ability to generalize to unseen data
(history.history['val_accuracy']) over the epochs. The x-axis represents the number of training epochs, and
the y-axis represents the
Visualize Training History: Matplotlib is used to plot the training accuracy (history.history['accuracy']) and
validation accuracy
(history.history['val_accuracy']) over the epochs. The x-axis represents the number of training epochs, and
the y-axis represents the
corresponding accuracy values. This visualization provides insight into the training process, showing how
well the model performs on both the training and validation sets over time.
Model Performance Report: The code prints a summary report on the model's performance.
Specifically, it prints the test accuracy obtained from the evaluation of the model on the test
dataset. This provides a concise overview of the model's accuracy on unseen data, summarizing
its effectiveness in making predictions.
# report on the model's performance
print("Model Performance Report:")
print(f"Test accuracy: {test_accuracy}")
Model Performance Report:
Test accuracy: 0.9061999917030334The model's performance on the test dataset is reported as
an accuracy of approximately 90.62%. This metric reflects the proportion of correctly classified