0% found this document useful (0 votes)
3 views

Python Unit 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Python Unit 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT-5

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the
structure and function of the brain called artificial neural networks.

It has networks capable of learning unsupervised from data that is unstructured or unlabelled.
Also known as deep neural learning or deep neural network.

we construct larger neural networks and train them with more and more data, their
performance continues to increase.

This is generally different to other machine learning techniques that reach a plateau in
performance.

Deep learning is used across all industries for a number of different tasks. Commercial apps that
use image recognition, open-source platforms with consumer recommendation apps, and
medical research tools that explore the possibility of reusing drugs for new ailments are a few of
the examples of deep learning incorporation.

Modern state-of-the-art deep learning is focused on training deep (many layered) neural
network models using the backpropagation algorithm. The most popular techniques are:

 Multilayer Perceptron Networks.


 Convolutional Neural Networks.
 Long Short-Term Memory Recurrent Neural Networks.

Multilayer Perceptron
A multi-layered perceptron (MLP) is one of the most common neural network models used in
the field of deep learning.

Often referred to as a “vanilla” neural network, an MLP is simpler than the complex models of
today’s era.

However, the techniques it introduced have paved the way for further advanced neural
networks.

The multilayer perceptron (MLP) is used for a variety of tasks, such as stock analysis, image
identification, spam detection, and election voting predictions.

The Basic Structure

A multi-layered perceptron consists of interconnected neurons transferring information to each


other, much like the human brain. Each neuron is assigned a value. The network can be divided
into three main layers.

Dept. Of C.S.E-C.B.I.T Page 1


Input Layer

This is the initial layer of the network which takes in an input which will be used to produce an
output.

Hidden Layer(s)

The network needs to have at least one hidden layer. The hidden layer(s) perform computations
and operations on the input data to produce something meaningful.

Output Layer

The neurons in this layer display a meaningful output.

Connections

The MLP is a feedforward neural network, which means that the data is transmitted from the
input layer to the output layer in the forward direction.

The connections between the layers are assigned weights. The weight of a connection specifies
its importance. This concept is the backbone of an MLP’s learning process.

While the inputs take their values from the surroundings, the values of all the other neurons are
calculated through a mathematical function involving the weights and values of the layer before
it.

For example, the value of the h5 node could be:

h5=h1.w8+h2.w9

Dept. Of C.S.E-C.B.I.T Page 2


Backpropagation:
Backpropagation is a technique used to optimize the weights of an MLP using the outputs as
inputs.

In a conventional MLP, random weights are assigned to all the connections. These random
weights propagate values through the network to produce the actual output.

Naturally, this output would differ from the expected output. The difference between the two
values is called the error.

Backpropagation refers to the process of sending this error back through the network,
readjusting the weights automatically so that eventually, the error between the actual and
expected output is minimized.

Input values

X1=0.05
X2=0.10

Initial weight

W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values

b1=0.35 b2=0.60

Dept. Of C.S.E-C.B.I.T Page 3


Target Values

T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass

To find the value of H1 we first multiply the input value from the weights as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the
weights as

Dept. Of C.S.E-C.B.I.T Page 4


y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as

We will calculate the value of y2 in the same way as y1

y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values
T1 and T2.

Now, we will find the total error, which is simply the difference between the outputs from the
target outputs. The total error is calculated as

So, the total error is

Dept. Of C.S.E-C.B.I.T Page 5


Now, we will back propagate this error to update the weights using a backward pass.

In this way, the output of the current iteration becomes the input and affects the next output.
This is repeated until the correct output is produced. The weights at the end of the process
would be the ones on which the neural network works correctly.

Loss functions :
In the context of an optimization algorithm, the function used to evaluate a candidate solution
is referred to as the objective function.

We may seek to maximize or minimize the objective function, meaning that we are searching for
a candidate solution that has the highest or lowest score respectively.

Typically, with neural networks, we seek to minimize the error. As such, the objective function is
often referred to as a cost function or a loss function and the value calculated by the loss
function is referred to as simply “loss.”

The cost function reduces all the various good and bad aspects of a possibly complex system
down to a single number, a scalar value, which allows candidate solutions to be ranked and
compared.

There are many functions that could be used to estimate the error of a set of weights in a neural
network.

Maximum likelihood seeks to find the optimum values for the parameters by maximizing a
likelihood function derived from the training data.

Under the framework maximum likelihood, the error between two probability distributions is
measured using cross-entropy.

Under maximum likelihood estimation, we would seek a set of model weights that minimize the
difference between the model’s predicted probability distribution given the dataset and the
distribution of probabilities in the training dataset. This is called the cross-entropy.

Our parametric model defines a distribution and we simply use the principle of maximum
likelihood. This means we use the cross-entropy between the training data and the model’s
predictions as the cost function.

Dept. Of C.S.E-C.B.I.T Page 6


1) Mean Squared Error Loss

Mean Squared Error loss, or MSE for short, is calculated as the average of the squared
differences between the predicted and actual values.

The result is always positive regardless of the sign of the predicted and actual values and a
perfect value is 0.0. The loss value is minimized, although it can be used in a maximization
optimization process by making the score negative.

2) Cross-Entropy Loss (or Log Loss)

Cross-entropy loss is often simply referred to as “cross-entropy,” “logarithmic loss,” “logistic


loss,” or “log loss” for short.

Each predicted probability is compared to the actual class output value (0 or 1) and a score is
calculated that penalizes the probability based on the distance from the expected value. The
penalty is logarithmic.

Cross-entropy loss is minimized, where smaller values represent a better model than larger
values. A model that predicts perfect probabilities has a cross entropy or log loss of 0.0.

Hyper parameter tuning :


Deep learning models are full of hyper-parameters and finding the best configuration for these
parameters in such a high dimensional space is not a trivial challenge.

Let us first understand these hyper-parameters: learning rate, batch size, momentum, and
weight decay.

These hyper-parameters act as knobs which can be tweaked during the training of the model.
For our model to provide best result, we need to find the optimal value of these hyper-
parameters.

The hyper-parameter tuning process is a tightrope walk to achieve a balance between


underfitting and overfitting.

Underfitting is when the machine learning model is unable to reduce the error for either the
test or training set. An underfitting model is not powerful enough to fit the underlying
complexities of the data distributions.

Overfitting happens when the machine learning model is so powerful as to fit the training set
too well and the generalization error increases.

Hyperparameter Optimization methods

Hyperparameters can have a direct impact on the training of machine learning algorithms. Thus,
to achieve maximal performance, it is important to understand how to optimize them. Here are
some common strategies for optimizing hyperparameters:

Dept. Of C.S.E-C.B.I.T Page 7


1. Manual Hyperparameter Tuning

Traditionally, hyperparameters were tuned manually by trial and error. This is still commonly
done, and experienced engineers can “guess” parameter values that will deliver very high
accuracy for ML models. However, there is a continual search for better, faster, and more
automatic methods to optimize hyperparameters.

2. Grid Search

Grid search is arguably the most basic hyperparameter tuning method. With this technique, we
simply build a model for each possible combination of all of the hyperparameter values
provided, evaluating each model, and selecting the architecture which produces the best
results.

Grid-search does NOT only apply to one model type but can be applied across machine learning
to calculate the best parameters to use for any given model.

For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two
hyperparameters that need to be optimized for good performance on unseen data: a
regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to
perform grid search, one selects a finite set of “reasonable” values for each, let’s say

Grid search then trains an SVM with each pair (C, γ) in the cartesian product of these two sets
and evaluates their performance on a held-out validation set (or by internal cross-validation on
the training set, in which case multiple SVMs are trained per pair). Finally, the grid search
algorithm outputs the settings that achieved the highest score in the validation procedure.

We then use the best set of hyperparameter values chosen in the grid search, in the actual
model as shown above.

Dept. Of C.S.E-C.B.I.T Page 8


One of the drawbacks of grid search is that when it comes to dimensionality, it suffers when
evaluating the number of hyperparameters grows exponentially. However, there is no
guarantee that the search will produce the perfect solution, as it usually finds one by aliasing
around the right set.

3. Random Search

Often some of the hyperparameters matter much more than others. Performing random search
rather than grid search allows a much more precise discovery of good values for the important
ones.

Random Search sets up a grid of hyperparameter values and selects random combinations to
train the model and score. This allows you to explicitly control the number of parameter
combinations that are attempted. The number of search iterations is set based on time or
resources. Scikit Learn offers the RandomizedSearchCV function for this process.

The chances of finding the optimal parameter are comparatively higher in random search
because of the random search pattern where the model might end up being trained on the
optimized parameters without any aliasing. Random search works best for lower dimensional
data since the time taken to find the right set is less with less number of iterations. Random
search is the best parameter search technique when there is less number of dimensions.
In the case of deep learning algorithms, it outperforms the grid search.

Dept. Of C.S.E-C.B.I.T Page 9


In the above figure, yay that you have two parameters, with 5x6 grid search you check only 5
different parameter values from each of the parameters (six rows and five columns on the plot
on the left), while with the random search you check 14 different parameter values of each of
the parameters.

4. Bayesian Optimization

Bayesian optimization belongs to a class of sequential model-based optimization (SMBO)


algorithms that allow for one to use the results of our previous iteration to improve our
sampling method of the next experiment.
Bayesian optimization works by constructing a posterior distribution of functions (Gaussian

process ) that best describes the function you want to optimize.


As the number of observations grows, the posterior distribution improves, and the algorithm
becomes more certain of which regions in parameter space are worth exploring and which are
not.

Overview of RNN, CNN and LSTM :


In Deep Learning there are three fundamental architectures of neural network that perform well
on different types of data which are FFNN, RNN, CNN and LSTM.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a time twist.

Dept. Of C.S.E-C.B.I.T Page 10


This neural network isn’t stateless, has connections between passes and connections through
time.

They are a class of artificial neural network where connections between nodes form a directed
graph along a sequence like features links from a layer to previous layers, allowing information
to flow back into the previous parts of the network thus each model in the layers depends on
past events, allowing information to persist.

In this way, RNNs can use their internal state (memory) to process sequences of inputs. This
makes them applicable to tasks such as unsegmented, connected handwriting recognition or
speech recognition.

But they not only work on the information you feed but also on the related information from
the past which means whatever you feed and train the network matters, like feeding it ‘chicken’
then ‘egg’ may give different output in comparison to ‘egg’ then ‘chicken’.

RNNs also have problems like vanishing (or exploding) gradient/long-term dependency problem
where information rapidly gets lost over time.

Actually, it’s the weight which gets lost when it reaches a value of 0 or 1 000 000, not the
neuron.

But in this case, the previous state won’t be very informative as it’s the weight which stores the
information from the past.

Long Short Term Memory (LSTM)

Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last
step is fed as input in the current step.

LSTM was designed by Hochreiter & Schmidhuber. It tackled the problem of long-term
dependencies of RNN in which the RNN cannot predict the word stored in the long term
memory but can give more accurate predictions from the recent information.

As the gap length increases RNN does not give efficient performance. LSTM can by default retain
the information for long period of time.

Dept. Of C.S.E-C.B.I.T Page 11


It is used for processing, predicting and classifying on the basis of time series data.

Structure Of LSTM:

LSTM has a chain structure that contains four neural networks and different memory blocks
called cells.

Information is retained by the cells and the memory manipulations are done by the gates. There
are three gates –

Forget Gate: The information that no longer useful in the cell state is removed with the forget
gate.

Dept. Of C.S.E-C.B.I.T Page 12


Input gate: Addition of useful information to the cell state is done by input gate.

Output gate: The task of extracting useful information from the current cell state to be
presented as an output is done by output gate.

Some of the famous applications of LSTM includes:


Language Modelling

1. Machine Translation
2. Image Captioning
3. Handwriting generation
4. Question Answering Chatbots

Convolutional Neural Network (CNN) :

A convolutional neural network (CNN, or ConvNet) is another class of deep neural


networks. CNNs are most commonly employed in computer vision.

Given a series of images or videos from the real world, with the utilization of CNN, the AI
system learns to automatically extract the features of these inputs to complete a specific task,
e.g., image classification, face authentication, and image semantic segmentation.

Different from fully connected layers in MLPs, in CNN models, one or multiple convolution layers
extract the simple features from input by executing convolution operations.
Dept. Of C.S.E-C.B.I.T Page 13
Each layer is a set of nonlinear functions of weighted sums at different coordinates of spatially
nearby subsets of outputs from the prior layer, which allows the weights to be reused.

Concept of a Convolution Neural Network (CNN)

Applying various convolutional filters, CNN machine learning models can capture the high-level
representation of the input data, making it most popular for computer vision tasks, such as
image classification (e.g., AlexNet, VGG network, ResNet, MobileNet) and object detection (e.g.,
Fast R-CNN, Mask R-CNN, YOLO, SSD).

 AlexNet. For image classification, as the first CNN neural network to win the ImageNet
Challenge in 2012, AlexNet consists of five convolution layers and three fully connected
layers. Thus, AlexNet requires 61 million weights and 724 million MACs (multiply-add
computation) to classify the image with a size of 227×227.
 VGG-16. To achieve higher accuracy, VGG-16 is trained to a deeper structure of 16 layers
consisting of 13 convolution layers and three fully connected layers, requiring 138
million weights and 15.5G MACs to classify the image with a size of 224×224.
 GoogleNet. To improve accuracy while reducing the computation of DNN inference,
GoogleNet introduces an inception module composed of different sized filters. As a
result, GoogleNet achieves a better accuracy performance than VGG-16 while only
requiring seven million weights and 1.43G MACs to process the image with the same
size.
 ResNet. ResNet, the state-of-the-art effort, uses the “shortcut” structure to reach a
human-level accuracy with a top-5 error rate below 5%. In addition, the “shortcut”
module is used to solve the gradient vanishing problem during the training process,
making it possible to train a DNN model with a deeper structure.

Data Science application to text :

Dept. Of C.S.E-C.B.I.T Page 14


Text data analysis is becoming easier and easier every day. Prominent programming languages
like Python and R have great libraries for text data analysis.

Text analytics is the process of transforming unstructured text documents into usable,
structured data. Text analysis works by breaking apart sentences and phrases into their
components and then evaluating each part's role and meaning using complex software rules and
machine learning algorithms.

Decades ago, text analytics involved simple tasks like calculating word frequencies. Over the last
few years, artificial intelligence technologies like natural language understanding (NLU) and
machine learning, and techniques like deep learning have dramatically improved the
effectiveness of text analytics.

Text Analytics techniques can be understood as the processes that go into mining the text and
discovering insights from it. These text mining techniques generally employ different text mining
tools and applications for their execution.

The following are the various text mining techniques:

1) Information Extraction:

This is the most popular text mining technique. Information exchange refers to the process of
extracting meaningful information from vast chunks of textual data. Whatever information is
extracted is then stored in a database for future access and retrieval.

2) Clustering:

Clustering is one of the most crucial text mining techniques. It seeks to identify intrinsic
structures in textual information and organize them into relevant subgroups or 'clusters' for
further analysis.

3) Summarization:

Text summarization refers to the process of automatically generating a compressed version of a


specific text that holds valuable information for the end-user. This text mining technique aims to
browse through multiple text sources to craft summaries of texts. Text summarization
integrates and combines the various methods that employ text categorization like decision
trees, neural networks, regression models, and swarm intelligence.

4) Categorization:

This is one of those text mining techniques that is a form of "supervised" learning where in
normal language texts are assigned to a predefined set of topics depending upon their content.
Thus, categorization or rather Natural Language Processing (NLP) is a process of gathering text
documents and processing and analyzing them to uncover the right topics or indexes for each
document.

Data Science application to images :


Dept. Of C.S.E-C.B.I.T Page 15
Digital Image Processing consists of the various techniques and methods involving in the
manipulation of images on a computer.

Image is basically a two-dimensional signal. The signal function is f(x,y), where the value of x and
y at a point generates the pixel at the point. Image is basically a two-dimensional array
consisting of numbers between 0 and 255.

Image Processing help in :


1. Improvement in digital information stored by us.

2. Making working with images automated.

3. Better image optimization leading to efficient storage and transmission.

Getting started with Image Processing in Python:

Let us get started with some basic Image related tasks in Python. We will make use of PIL.

PIL:

Python Imaging Library is used for various image processing tasks.

Installation:
pip install pillow

With PIL installed, we can now move to the code.

First, we work with some matplotlib functions.

import matplotlib.image as img

import matplotlib.pyplot as plt

import numpy as np

%matplotlib inline

The following image will be read. It is named image1.jpg.

# reading jpg image

Dept. Of C.S.E-C.B.I.T Page 16


img = img.imread('image1.jpg')
plt.imshow(img)

The image is read.

# modifying the shape of the image


lum1 = img[:, :, 0]
plt.imshow(lum1)

Now the image shape is modified.

Now we will change it into the “hot” colourmap.

plt.imshow(lum1, cmap ='hot')


plt.colorbar()

Dept. Of C.S.E-C.B.I.T Page 17


Image output looks:

Now we try a different colormap.

imgplot = plt.imshow(lum1)
imgplot.set_cmap('nipy_spectral')

Image output:

The reason for using colormaps is that, often in various applications and uses, having a uniform

colormap helps. Read more about Colourmaps: Choosing Colormaps in Matplotlib.

Now, we work with PIL.

from PIL import Image

We will use this image file, named as: people.jpg.


Dept. Of C.S.E-C.B.I.T Page 18
img2 = Image.open('people.jpg')
plt.imshow(img2)

The image is read.

Now, we resize the image.

img2.thumbnail((50, 50), Image.ANTIALIAS) # resizes image in-place


imgplot = plt.imshow(img2)

Dept. Of C.S.E-C.B.I.T Page 19


imgplot1 = plt.imshow(img2, interpolation="nearest")

imgplot2 = plt.imshow(img2, interpolation="bicubic")

Dept. Of C.S.E-C.B.I.T Page 20


But, why do we purposefully blur images in Image Processing? Well, often for Pattern
Recognition and Computer Vision algorithms, it becomes difficult to process the images if they
are very sharp. Thus blurring is done to make the images smooth. Blurring also makes the colour
transition in an image, from one side to the other, a lot more smooth.

Now, let us verify the dimensions of the car image, we worked on earlier.

#some more interesting stuff


file='image1.jpg'
with Image.open(file) as image:
width, height = image.size
#Image width, height is be obtained

These are the dimensions we got earlier as well. So we can conclude that the image is 320*658.

Let us also try rotating and transposing the image.

#Relative Path
img3 = Image.open("image1.jpg")
#Angle given
img_rot= img3.rotate(180)
#Saved in the same relative location
img_rot.save("rotated_picture.jpg")

This is the rotated image.

Dept. Of C.S.E-C.B.I.T Page 21


#transposing image
transposed_img = img3.transpose(Image.FLIP_LEFT_RIGHT)
#Saved in the same relative location
transposed_img.save("transposed_img.jpg")

This is the transposed image.

Data Science application to videos:


Videos are nothing but a collection of a set of images. These images are called frames and can
be combined to get the original video. So, a problem related to video data is not that different
from an image classification or an object detection problem. There is just one extra step of
extracting frames from the video.

For performing video analysis I’m taking the video of Tom and Jerry to calculate the screen time
of both Tom and Jerry from a given video. Let me first summarize the steps we will follow in this
article to crack this problem:

I. Import and read the video, extract frames from it, and save them as images
II. Label a few images for training the model
III. Build our model on training data
IV. Make predictions for the remaining images
V. Calculate the screen time of both TOM and JERRY

Just following these steps will help you in solving many such video related problems in deep
learning.

How to handle video files in Python

Let us start with importing all the necessary libraries. Go ahead and install the below libraries in
case you haven’t already:

Dept. Of C.S.E-C.B.I.T Page 22


Numpy, Pandas, Matplotlib , Keras , Skimage , OpenCV

import cv2 # for capturing videos


import math # for mathematical operations
import matplotlib.pyplot as plt # for plotting the images
%matplotlib inline
import pandas as pd
from keras.preprocessing import image # for preprocessing the images
import numpy as np # for mathematical operations
from keras.utils import np_utils
from skimage.transform import resize # for resizing images

Step – 1: Read the video, extract frames from it and save them as images

Now we will load the video and convert it into frames. We will first capture the video from the
given directory using the VideoCapture() function, and then we’ll extract frames from the video
and save them as an image using the imwrite() function. Let’s code it:

count = 0
videoFile = "Tom and jerry.mp4"
cap = cv2.VideoCapture(videoFile) # capturing the video from the given path
frameRate = cap.get(5) #frame rate
x=1
while(cap.isOpened()):
frameId = cap.get(1) #current frame number
ret, frame = cap.read()
if (ret != True):
break
if (frameId % math.floor(frameRate) == 0):
filename ="frame%d.jpg" % count;count+=1
cv2.imwrite(filename, frame)
cap.release()
print ("Done!")
Done!

Once this process is complete, ‘Done!’ will be printed on the screen as confirmation that the
frames have been created.

Let us try to visualize an image (frame). We will first read the image using the imread() function
of matplotlib, and then plot it using the imshow() function.

img = plt.imread('frame0.jpg') # reading image using its name

plt.imshow(img)

Dept. Of C.S.E-C.B.I.T Page 23


This is the first frame from the video. We have extracted one frame for each second, from the
entire duration of the video. Since the duration of the video is 4:58 minutes (298 seconds), we
now have 298 images in total.

Our task is to identify which image has TOM, and which image has JERRY.

Step – 2: Label a few images for training the model

A possible solution is to manually give labels to a few of the images and train the model on
them. Once the model has learned the patterns, we can use it to make predictions.

0 – neither JERRY nor TOM


1 – for JERRY
2 – for TOM
I have labelled all the images in the mapping.csv file which contains each image name and their
corresponding class (0 or 1 or 2).

data = pd.read_csv('mapping.csv') # reading the csv file

data.head() # printing first five rows of the file

The mapping file contains two columns:

Image_ID: Contains the name of each image

Class: Contains corresponding class for each image

Dept. Of C.S.E-C.B.I.T Page 24


Our next step is to read the images which we will do based on their names, aka,
the Image_ID column.

X = [ ] # creating an empty array


for img_name in data.Image_ID:
img = plt.imread('' + img_name)
X.append(img) # storing each image in array X
X = np.array(X) # converting list to array

Tada! We now have the images with us. Remember, we need two things to train our model:

Training images, and Their corresponding class

Since there are three classes, we will one hot encode them using the to_categorical() function
of keras.utils.

y = data.Class

dummy_y = np_utils.to_categorical(y) # one hot encoding Classes

We will be using a VGG16 pretrained model which takes an input image of shape (224 X 224 X
3). Since our images are in a different size, we need to reshape all of them. We will use
the resize() function of skimage.transform to do this.

image = []
for i in range(0,X.shape[0]):
a = resize(X[i], preserve_range=True, output_shape=(224,224)).astype(int) # reshaping to
224*224*3
image.append(a)
X = np.array(image)

All the images have been reshaped to 224 X 224 X 3. But before passing any input to the model,
we must preprocess it for that Use the preprocess_input() function
of keras.applications.vgg16 to perform this step.

from keras.applications.vgg16 import preprocess_input

X = preprocess_input(X, mode='tf') # preprocessing the input data

We also need a validation set to check the performance of the model on unseen images. We will
make use of the train_test_split() function of the sklearn.model_selection module to randomly
divide images into training and validation set.

from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(X, dummy_y, test_size=0.3,


random_state=42) # preparing the validation set

Dept. Of C.S.E-C.B.I.T Page 25


Step 3: Building the model

The next step is to build our model. As mentioned, we shall be using the VGG16 pretrained
model for this task. Let us first import the required libraries to build the model:

from keras.models import Sequential

from keras.applications.vgg16 import VGG16

from keras.layers import Dense, InputLayer, Dropout

We will now load the VGG16 pretrained model and store it as base_model:

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) #


include_top=False to remove the top layer

We will make predictions using this model for X_train and X_valid, get the features, and then
use those features to retrain the model.

X_train = base_model.predict(X_train)

X_valid = base_model.predict(X_valid)

X_train.shape, X_valid.shape

The shape of X_train and X_valid is (208, 7, 7, 512), (90, 7, 7, 512) respectively. In order to pass
it to our neural network, we have to reshape it to 1-D.

X_train = X_train.reshape(208, 7*7*512) # converting to 1-D

X_valid = X_valid.reshape(90, 7*7*512)

We will now preprocess the images and make them zero-centered which helps the model to
converge faster.

train = X_train/X_train.max() # centering the data

X_valid = X_valid/X_train.max()

Finally, we will build our model. This step can be divided into 3 sub-steps:

I. Building the model


II. Compiling the model
III. Training the model

# i. Building the model

model = Sequential()
model.add(InputLayer((7*7*512,))) # input layer
model.add(Dense(units=1024, activation='sigmoid')) # hidden layer
model.add(Dense(3, activation='softmax')) # output layer
Let’s check the summary of the model using the summary() function:

Dept. Of C.S.E-C.B.I.T Page 26


model.summary()

We have a hidden layer with 1,024 neurons and an output layer with 3 neurons (since we have 3
classes to predict). Now we will compile our model:

# ii. Compiling the model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In the final step, we will fit the model and simultaneously also check its performance on the
unseen images, i.e., validation images:

# iii. Training the model

model.fit(train, y_train, epochs=100, validation_data=(X_valid, y_valid))

In the next section, we will try to calculate the screen time of TOM and JERRY in a new video.

Next, we will import the images for testing and then reshape them as per the requirements of
the aforementioned pretrained model:

test_image = []
for img_name in test.Image_ID:
img = plt.imread('' + img_name)
test_image.append(img)
test_img = np.array(test_image)
test_image = []
for i in range(0,test_img.shape[0]):
a = resize(test_img[i], preserve_range=True, output_shape=(224,224)).astype(int)
test_image.append(a)
test_image = np.array(test_image)

Dept. Of C.S.E-C.B.I.T Page 27


We need to make changes to these images similar to the ones we did for the training images.
We will preprocess the images, use the base_model.predict() function to extract features from
these images using the VGG16 pretrained model,

# preprocessing the images

test_image = preprocess_input(test_image, mode='tf')

# extracting features from the images using pretrained model

test_image = base_model.predict(test_image)

Step – 4: Make predictions for the remaining images

predictions = model.predict_classes(test_image)

Step – 5 Calculate the screen time of both TOM and JERRY

Recall that Class ‘1’ represents the presence of JERRY, while Class ‘2’ represents the presence of
TOM. We shall make use of the above predictions to calculate the screen time of both these
legendary characters:

print("The screen time of JERRY is", predictions[predictions==1].shape[0], "seconds")


print("The screen time of TOM is", predictions[predictions==2].shape[0], "seconds")

Image classification:
Image classification is the process of taking an input (like a picture) and outputting a class (like “cat”)
or a probability that the input is a particular class (“there’s a 90% probability that this input is a cat”).

For computers to learn and extrapolate information similarly, we either program a set of rules for
them to follow in a process called “supervised learning,” or we give them the answers and they try
to understand what was the initial question or goal, or “unsupervised learning.” With both methods,
it’s a major challenge to encode the right rules or give the right format of answers for computers to
see images.

Image Recognition

Image recognition is essentially a computer vision technique that gives “eyes” to computers for
them to “see” and understand the world through images and videos.

Image recognition models are trained to take in an image as input, deconstruct it down to its basic
form, then produce labels that categorize the image via a neural network (NN).

Dept. Of C.S.E-C.B.I.T Page 28


Images (input) → NN (layers) → Labels (output)

Example :

As an example, let’s train a model to recognize if an image is of the Eiffel Tower.

Here’s an example of what the model does in practice:

1. Input: Image of Eiffel Tower

2. Layers in NN: The model will first see the image as pixels, then detect the edges and contours of
its content. Finally, it will look at the whole object before producing a final guess about what the
model “sees.”

3. Output: Eiffel Tower (label)

Types of image recognition

The 3 classes of image recognition are:

1. Single class — one label per image (our example)

2. Multiclass — several labels per image (dog and cat in an image)

3. Binary classifiers — two classes (i.e. “Eiffel Tower” or “Not Eiffel Tower”)

The 5 steps to build an image classification model

1. Load and normalize the train and test data

2. Define the Convolutional Neural Network (CNN)

3. Define the loss function and optimizer

4. Train the model on the train data

5. Test the model on the test data

We can build an image recognition model using traditional statistical approaches such as using
Support Vector Machines or Decision Trees, but the state-of-the-art method is with Neural

Dept. Of C.S.E-C.B.I.T Page 29


Networks. The de facto algorithm used for image recognition are convolutional neural networks
(CNNs).

Recommender systems :
Recommender systems are machine learning systems that help users discover new product and
services.

Recommender systems are an essential feature in our digital world, as users are often
overwhelmed by choice and need help finding what they're looking for.

A Recommender System refers to a system that is capable of predicting the future preference of
a set of items for a user, and recommend the top items.

An important component of any of these systems is the recommender function, which takes
information about the user and predicts the rating that user might assign to a product.

Some examples of recommender systems in action include product recommendations on


Amazon, Netflix suggestions for movies and TV shows in your feed, recommended videos on
YouTube, music on Spotify, the Facebook newsfeed and Google Ads.

There are two methods to construct a recommender system :

Collaborative filtering methods

Collaborative methods for recommender systems are methods that are based solely on the past
interactions recorded between users and items in order to produce new recommendations.
These interactions are stored in the so-called “user-item interactions matrix”.

Dept. Of C.S.E-C.B.I.T Page 30


Then, the main idea that rules collaborative methods is that these past user-item interactions
are sufficient to detect similar users and/or similar items and make predictions based on these
estimated proximities.

Content based methods

Content based approaches use additional information about users and/or items. The idea of
content based methods is to try to build a model, based on the available “features”, that explain
the observed user-item interactions and also need to look at the profile (age, sex, …) of this user
and, based on this information, to determine relevant movies to suggest.

Social Network Graph:


Social network analysis (SNA), also known as network science, is a field of data analytics that uses
networks and graph theory to understand social structures. SNA techniques can also be applied to
networks outside of the societal realm.

In order to build SNA graphs, two key components are required: actors and relationships. A common
application of SNA techniques is with the internet. Web pages on the internet often link to other
webpages — either on their own website or another website. These links can be considered
relationships between actors (web pages). This is actually a key component of search engine
architecture.

What Does a Social Network Graph Look Like?

A social network graph contains both points and lines connecting those dots — similar to a connect-
the-dot puzzle. The points represent the actors and the lines represent the relationships. An
example of a social network graph can be seen below

Dept. Of C.S.E-C.B.I.T Page 31


What Tools Do I Need To Get Started?

Like many things in data science, there a variety of tools you can use to conduct SNA.

Gephi
This guide will use Gephi, a free software for Mac, PC, and Linux, in order to build network graphs
and run some analytics on them. Gephi provides a GUI interface and will not require any coding to
use.

Python/Excel
In order to build network graphs in Gephi, a specific data format must be used. In order to fit our
data into the correct format a tool must be used to create CSV files. With simple data, Excel should
suffice. However, when using large amounts of data or data that must have its relationships
extracted it is recommended to use Python. Don’t fret if you do not have any Python skills — you
should still be able to build some basic networks.

Data Source
You will also need a data source for your network. Network data have two requirements: actors and
relationships. Some data will require these relationships to be extracted, and others it will be more
explicit in the dataset. I recommend using datasets from Kaggle to get started.

Terminology of SNG:

Nodes and Edges :

In network science, actors are referred to as nodes (the dots on the graph) and relationships as
edges (the lines on the graph).

Dept. Of C.S.E-C.B.I.T Page 32


 Nodes can represent a variety of ‘actors’.
 Edges can represent a variety of ‘relationships’.

Edge Direction
There are two types of edges: directed and undirected.

 Directed edges are applied from one node to another with a starting node and an ending
node.

Undirected edges are the opposite of directed edges.

Edge Weight

An Edge’s weight is the number of times that edge appears between two specific nodes.

Degree

A node’s degree is the number of edges the node has.

Dept. Of C.S.E-C.B.I.T Page 33


Network-Level Measures

Network Size:
Network size is the number of nodes in the network. The size of a network does not take into
consideration the number of edges.

Network Density
Network density is the number of edges divided by the total possible edges. For example, a network
with Node A connected to Node B, and Node B connected to Node C, the network density is 2/3
because there are two edges out of a possible 3.

Dept. Of C.S.E-C.B.I.T Page 34


Path-Level Measures
Path-level measures provide information for a path between one node and another node. Paths
follow edges between nodes, known as hops. There are also many different path-level
measures, but this article will cover length and distance.

Length
Length is the number of edges between the starting and ending nodes, known as hops. In order
to calculate the length between two nodes, a path must be predetermined.

Distance
Distance is the number of edges or hops between the starting and ending nodes following the
shortest path. Unlike length, the distance between two nodes uses only the shortest path — the
path that requires the least hops.

Implementation
Now that you have an understanding of social network analysis terms and concepts, applying
these techniques to a dataset using the Gephi software.

Download and Install Gephi


First, download and install the Gephi software for the operating system your machine is
running.

Dataset
we are using the Marvel Universe Social Network dataset from Kaggle. After downloading the
dataset, there will be three csv files: nodes, edges, and network. Open the file nodes.csv in
Excel.

Dept. Of C.S.E-C.B.I.T Page 35


Loading Network Data into Gephi
 Open the Gephi software.
 Click on ‘New Project’. If you do not see the welcome screen, go to file>new project.
 Then, click the Data Laboratory tab.
 The data laboratory tab is where we will load in our edge and node list files. To import a
list click the import spreadsheet button.
 An import wizard will then step you through correctly importing the node list. Set
Separator to Comma, Import as to Nodes table, and Charset as UTF-8. Then click next.
 After clicking next, the wizard will provide additional setting configurations. Set Time
representation to Intervals. For Imported columns, check the node and type boxes and
set their data types to string. Then, click finish.

Congrats! You have just imported the node and edge lists! In the data library, you can switch
your view between these two lists by clicking on Nodes or Edges in the top left-hand corner.

Now that the data has been imported it is time to view the graph. Click on the overview tab.

Dept. Of C.S.E-C.B.I.T Page 36

You might also like