Python Unit 5
Python Unit 5
Deep Learning is a subfield of machine learning concerned with algorithms inspired by the
structure and function of the brain called artificial neural networks.
It has networks capable of learning unsupervised from data that is unstructured or unlabelled.
Also known as deep neural learning or deep neural network.
we construct larger neural networks and train them with more and more data, their
performance continues to increase.
This is generally different to other machine learning techniques that reach a plateau in
performance.
Deep learning is used across all industries for a number of different tasks. Commercial apps that
use image recognition, open-source platforms with consumer recommendation apps, and
medical research tools that explore the possibility of reusing drugs for new ailments are a few of
the examples of deep learning incorporation.
Modern state-of-the-art deep learning is focused on training deep (many layered) neural
network models using the backpropagation algorithm. The most popular techniques are:
Multilayer Perceptron
A multi-layered perceptron (MLP) is one of the most common neural network models used in
the field of deep learning.
Often referred to as a “vanilla” neural network, an MLP is simpler than the complex models of
today’s era.
However, the techniques it introduced have paved the way for further advanced neural
networks.
The multilayer perceptron (MLP) is used for a variety of tasks, such as stock analysis, image
identification, spam detection, and election voting predictions.
This is the initial layer of the network which takes in an input which will be used to produce an
output.
Hidden Layer(s)
The network needs to have at least one hidden layer. The hidden layer(s) perform computations
and operations on the input data to produce something meaningful.
Output Layer
Connections
The MLP is a feedforward neural network, which means that the data is transmitted from the
input layer to the output layer in the forward direction.
The connections between the layers are assigned weights. The weight of a connection specifies
its importance. This concept is the backbone of an MLP’s learning process.
While the inputs take their values from the surroundings, the values of all the other neurons are
calculated through a mathematical function involving the weights and values of the layer before
it.
h5=h1.w8+h2.w9
In a conventional MLP, random weights are assigned to all the connections. These random
weights propagate values through the network to produce the actual output.
Naturally, this output would differ from the expected output. The difference between the two
values is called the error.
Backpropagation refers to the process of sending this error back through the network,
readjusting the weights automatically so that eventually, the error between the actual and
expected output is minimized.
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the
weights as
y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values
T1 and T2.
Now, we will find the total error, which is simply the difference between the outputs from the
target outputs. The total error is calculated as
In this way, the output of the current iteration becomes the input and affects the next output.
This is repeated until the correct output is produced. The weights at the end of the process
would be the ones on which the neural network works correctly.
Loss functions :
In the context of an optimization algorithm, the function used to evaluate a candidate solution
is referred to as the objective function.
We may seek to maximize or minimize the objective function, meaning that we are searching for
a candidate solution that has the highest or lowest score respectively.
Typically, with neural networks, we seek to minimize the error. As such, the objective function is
often referred to as a cost function or a loss function and the value calculated by the loss
function is referred to as simply “loss.”
The cost function reduces all the various good and bad aspects of a possibly complex system
down to a single number, a scalar value, which allows candidate solutions to be ranked and
compared.
There are many functions that could be used to estimate the error of a set of weights in a neural
network.
Maximum likelihood seeks to find the optimum values for the parameters by maximizing a
likelihood function derived from the training data.
Under the framework maximum likelihood, the error between two probability distributions is
measured using cross-entropy.
Under maximum likelihood estimation, we would seek a set of model weights that minimize the
difference between the model’s predicted probability distribution given the dataset and the
distribution of probabilities in the training dataset. This is called the cross-entropy.
Our parametric model defines a distribution and we simply use the principle of maximum
likelihood. This means we use the cross-entropy between the training data and the model’s
predictions as the cost function.
Mean Squared Error loss, or MSE for short, is calculated as the average of the squared
differences between the predicted and actual values.
The result is always positive regardless of the sign of the predicted and actual values and a
perfect value is 0.0. The loss value is minimized, although it can be used in a maximization
optimization process by making the score negative.
Each predicted probability is compared to the actual class output value (0 or 1) and a score is
calculated that penalizes the probability based on the distance from the expected value. The
penalty is logarithmic.
Cross-entropy loss is minimized, where smaller values represent a better model than larger
values. A model that predicts perfect probabilities has a cross entropy or log loss of 0.0.
Let us first understand these hyper-parameters: learning rate, batch size, momentum, and
weight decay.
These hyper-parameters act as knobs which can be tweaked during the training of the model.
For our model to provide best result, we need to find the optimal value of these hyper-
parameters.
Underfitting is when the machine learning model is unable to reduce the error for either the
test or training set. An underfitting model is not powerful enough to fit the underlying
complexities of the data distributions.
Overfitting happens when the machine learning model is so powerful as to fit the training set
too well and the generalization error increases.
Hyperparameters can have a direct impact on the training of machine learning algorithms. Thus,
to achieve maximal performance, it is important to understand how to optimize them. Here are
some common strategies for optimizing hyperparameters:
Traditionally, hyperparameters were tuned manually by trial and error. This is still commonly
done, and experienced engineers can “guess” parameter values that will deliver very high
accuracy for ML models. However, there is a continual search for better, faster, and more
automatic methods to optimize hyperparameters.
2. Grid Search
Grid search is arguably the most basic hyperparameter tuning method. With this technique, we
simply build a model for each possible combination of all of the hyperparameter values
provided, evaluating each model, and selecting the architecture which produces the best
results.
Grid-search does NOT only apply to one model type but can be applied across machine learning
to calculate the best parameters to use for any given model.
For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two
hyperparameters that need to be optimized for good performance on unseen data: a
regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to
perform grid search, one selects a finite set of “reasonable” values for each, let’s say
Grid search then trains an SVM with each pair (C, γ) in the cartesian product of these two sets
and evaluates their performance on a held-out validation set (or by internal cross-validation on
the training set, in which case multiple SVMs are trained per pair). Finally, the grid search
algorithm outputs the settings that achieved the highest score in the validation procedure.
We then use the best set of hyperparameter values chosen in the grid search, in the actual
model as shown above.
3. Random Search
Often some of the hyperparameters matter much more than others. Performing random search
rather than grid search allows a much more precise discovery of good values for the important
ones.
Random Search sets up a grid of hyperparameter values and selects random combinations to
train the model and score. This allows you to explicitly control the number of parameter
combinations that are attempted. The number of search iterations is set based on time or
resources. Scikit Learn offers the RandomizedSearchCV function for this process.
The chances of finding the optimal parameter are comparatively higher in random search
because of the random search pattern where the model might end up being trained on the
optimized parameters without any aliasing. Random search works best for lower dimensional
data since the time taken to find the right set is less with less number of iterations. Random
search is the best parameter search technique when there is less number of dimensions.
In the case of deep learning algorithms, it outperforms the grid search.
4. Bayesian Optimization
A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a time twist.
They are a class of artificial neural network where connections between nodes form a directed
graph along a sequence like features links from a layer to previous layers, allowing information
to flow back into the previous parts of the network thus each model in the layers depends on
past events, allowing information to persist.
In this way, RNNs can use their internal state (memory) to process sequences of inputs. This
makes them applicable to tasks such as unsegmented, connected handwriting recognition or
speech recognition.
But they not only work on the information you feed but also on the related information from
the past which means whatever you feed and train the network matters, like feeding it ‘chicken’
then ‘egg’ may give different output in comparison to ‘egg’ then ‘chicken’.
RNNs also have problems like vanishing (or exploding) gradient/long-term dependency problem
where information rapidly gets lost over time.
Actually, it’s the weight which gets lost when it reaches a value of 0 or 1 000 000, not the
neuron.
But in this case, the previous state won’t be very informative as it’s the weight which stores the
information from the past.
Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last
step is fed as input in the current step.
LSTM was designed by Hochreiter & Schmidhuber. It tackled the problem of long-term
dependencies of RNN in which the RNN cannot predict the word stored in the long term
memory but can give more accurate predictions from the recent information.
As the gap length increases RNN does not give efficient performance. LSTM can by default retain
the information for long period of time.
Structure Of LSTM:
LSTM has a chain structure that contains four neural networks and different memory blocks
called cells.
Information is retained by the cells and the memory manipulations are done by the gates. There
are three gates –
Forget Gate: The information that no longer useful in the cell state is removed with the forget
gate.
Output gate: The task of extracting useful information from the current cell state to be
presented as an output is done by output gate.
1. Machine Translation
2. Image Captioning
3. Handwriting generation
4. Question Answering Chatbots
Given a series of images or videos from the real world, with the utilization of CNN, the AI
system learns to automatically extract the features of these inputs to complete a specific task,
e.g., image classification, face authentication, and image semantic segmentation.
Different from fully connected layers in MLPs, in CNN models, one or multiple convolution layers
extract the simple features from input by executing convolution operations.
Dept. Of C.S.E-C.B.I.T Page 13
Each layer is a set of nonlinear functions of weighted sums at different coordinates of spatially
nearby subsets of outputs from the prior layer, which allows the weights to be reused.
Applying various convolutional filters, CNN machine learning models can capture the high-level
representation of the input data, making it most popular for computer vision tasks, such as
image classification (e.g., AlexNet, VGG network, ResNet, MobileNet) and object detection (e.g.,
Fast R-CNN, Mask R-CNN, YOLO, SSD).
AlexNet. For image classification, as the first CNN neural network to win the ImageNet
Challenge in 2012, AlexNet consists of five convolution layers and three fully connected
layers. Thus, AlexNet requires 61 million weights and 724 million MACs (multiply-add
computation) to classify the image with a size of 227×227.
VGG-16. To achieve higher accuracy, VGG-16 is trained to a deeper structure of 16 layers
consisting of 13 convolution layers and three fully connected layers, requiring 138
million weights and 15.5G MACs to classify the image with a size of 224×224.
GoogleNet. To improve accuracy while reducing the computation of DNN inference,
GoogleNet introduces an inception module composed of different sized filters. As a
result, GoogleNet achieves a better accuracy performance than VGG-16 while only
requiring seven million weights and 1.43G MACs to process the image with the same
size.
ResNet. ResNet, the state-of-the-art effort, uses the “shortcut” structure to reach a
human-level accuracy with a top-5 error rate below 5%. In addition, the “shortcut”
module is used to solve the gradient vanishing problem during the training process,
making it possible to train a DNN model with a deeper structure.
Text analytics is the process of transforming unstructured text documents into usable,
structured data. Text analysis works by breaking apart sentences and phrases into their
components and then evaluating each part's role and meaning using complex software rules and
machine learning algorithms.
Decades ago, text analytics involved simple tasks like calculating word frequencies. Over the last
few years, artificial intelligence technologies like natural language understanding (NLU) and
machine learning, and techniques like deep learning have dramatically improved the
effectiveness of text analytics.
Text Analytics techniques can be understood as the processes that go into mining the text and
discovering insights from it. These text mining techniques generally employ different text mining
tools and applications for their execution.
1) Information Extraction:
This is the most popular text mining technique. Information exchange refers to the process of
extracting meaningful information from vast chunks of textual data. Whatever information is
extracted is then stored in a database for future access and retrieval.
2) Clustering:
Clustering is one of the most crucial text mining techniques. It seeks to identify intrinsic
structures in textual information and organize them into relevant subgroups or 'clusters' for
further analysis.
3) Summarization:
4) Categorization:
This is one of those text mining techniques that is a form of "supervised" learning where in
normal language texts are assigned to a predefined set of topics depending upon their content.
Thus, categorization or rather Natural Language Processing (NLP) is a process of gathering text
documents and processing and analyzing them to uncover the right topics or indexes for each
document.
Image is basically a two-dimensional signal. The signal function is f(x,y), where the value of x and
y at a point generates the pixel at the point. Image is basically a two-dimensional array
consisting of numbers between 0 and 255.
Let us get started with some basic Image related tasks in Python. We will make use of PIL.
PIL:
Installation:
pip install pillow
import numpy as np
%matplotlib inline
imgplot = plt.imshow(lum1)
imgplot.set_cmap('nipy_spectral')
Image output:
The reason for using colormaps is that, often in various applications and uses, having a uniform
Now, let us verify the dimensions of the car image, we worked on earlier.
These are the dimensions we got earlier as well. So we can conclude that the image is 320*658.
#Relative Path
img3 = Image.open("image1.jpg")
#Angle given
img_rot= img3.rotate(180)
#Saved in the same relative location
img_rot.save("rotated_picture.jpg")
For performing video analysis I’m taking the video of Tom and Jerry to calculate the screen time
of both Tom and Jerry from a given video. Let me first summarize the steps we will follow in this
article to crack this problem:
I. Import and read the video, extract frames from it, and save them as images
II. Label a few images for training the model
III. Build our model on training data
IV. Make predictions for the remaining images
V. Calculate the screen time of both TOM and JERRY
Just following these steps will help you in solving many such video related problems in deep
learning.
Let us start with importing all the necessary libraries. Go ahead and install the below libraries in
case you haven’t already:
Step – 1: Read the video, extract frames from it and save them as images
Now we will load the video and convert it into frames. We will first capture the video from the
given directory using the VideoCapture() function, and then we’ll extract frames from the video
and save them as an image using the imwrite() function. Let’s code it:
count = 0
videoFile = "Tom and jerry.mp4"
cap = cv2.VideoCapture(videoFile) # capturing the video from the given path
frameRate = cap.get(5) #frame rate
x=1
while(cap.isOpened()):
frameId = cap.get(1) #current frame number
ret, frame = cap.read()
if (ret != True):
break
if (frameId % math.floor(frameRate) == 0):
filename ="frame%d.jpg" % count;count+=1
cv2.imwrite(filename, frame)
cap.release()
print ("Done!")
Done!
Once this process is complete, ‘Done!’ will be printed on the screen as confirmation that the
frames have been created.
Let us try to visualize an image (frame). We will first read the image using the imread() function
of matplotlib, and then plot it using the imshow() function.
plt.imshow(img)
Our task is to identify which image has TOM, and which image has JERRY.
A possible solution is to manually give labels to a few of the images and train the model on
them. Once the model has learned the patterns, we can use it to make predictions.
Tada! We now have the images with us. Remember, we need two things to train our model:
Since there are three classes, we will one hot encode them using the to_categorical() function
of keras.utils.
y = data.Class
We will be using a VGG16 pretrained model which takes an input image of shape (224 X 224 X
3). Since our images are in a different size, we need to reshape all of them. We will use
the resize() function of skimage.transform to do this.
image = []
for i in range(0,X.shape[0]):
a = resize(X[i], preserve_range=True, output_shape=(224,224)).astype(int) # reshaping to
224*224*3
image.append(a)
X = np.array(image)
All the images have been reshaped to 224 X 224 X 3. But before passing any input to the model,
we must preprocess it for that Use the preprocess_input() function
of keras.applications.vgg16 to perform this step.
We also need a validation set to check the performance of the model on unseen images. We will
make use of the train_test_split() function of the sklearn.model_selection module to randomly
divide images into training and validation set.
The next step is to build our model. As mentioned, we shall be using the VGG16 pretrained
model for this task. Let us first import the required libraries to build the model:
We will now load the VGG16 pretrained model and store it as base_model:
We will make predictions using this model for X_train and X_valid, get the features, and then
use those features to retrain the model.
X_train = base_model.predict(X_train)
X_valid = base_model.predict(X_valid)
X_train.shape, X_valid.shape
The shape of X_train and X_valid is (208, 7, 7, 512), (90, 7, 7, 512) respectively. In order to pass
it to our neural network, we have to reshape it to 1-D.
We will now preprocess the images and make them zero-centered which helps the model to
converge faster.
X_valid = X_valid/X_train.max()
Finally, we will build our model. This step can be divided into 3 sub-steps:
model = Sequential()
model.add(InputLayer((7*7*512,))) # input layer
model.add(Dense(units=1024, activation='sigmoid')) # hidden layer
model.add(Dense(3, activation='softmax')) # output layer
Let’s check the summary of the model using the summary() function:
We have a hidden layer with 1,024 neurons and an output layer with 3 neurons (since we have 3
classes to predict). Now we will compile our model:
In the final step, we will fit the model and simultaneously also check its performance on the
unseen images, i.e., validation images:
In the next section, we will try to calculate the screen time of TOM and JERRY in a new video.
Next, we will import the images for testing and then reshape them as per the requirements of
the aforementioned pretrained model:
test_image = []
for img_name in test.Image_ID:
img = plt.imread('' + img_name)
test_image.append(img)
test_img = np.array(test_image)
test_image = []
for i in range(0,test_img.shape[0]):
a = resize(test_img[i], preserve_range=True, output_shape=(224,224)).astype(int)
test_image.append(a)
test_image = np.array(test_image)
test_image = base_model.predict(test_image)
predictions = model.predict_classes(test_image)
Recall that Class ‘1’ represents the presence of JERRY, while Class ‘2’ represents the presence of
TOM. We shall make use of the above predictions to calculate the screen time of both these
legendary characters:
Image classification:
Image classification is the process of taking an input (like a picture) and outputting a class (like “cat”)
or a probability that the input is a particular class (“there’s a 90% probability that this input is a cat”).
For computers to learn and extrapolate information similarly, we either program a set of rules for
them to follow in a process called “supervised learning,” or we give them the answers and they try
to understand what was the initial question or goal, or “unsupervised learning.” With both methods,
it’s a major challenge to encode the right rules or give the right format of answers for computers to
see images.
Image Recognition
Image recognition is essentially a computer vision technique that gives “eyes” to computers for
them to “see” and understand the world through images and videos.
Image recognition models are trained to take in an image as input, deconstruct it down to its basic
form, then produce labels that categorize the image via a neural network (NN).
Example :
2. Layers in NN: The model will first see the image as pixels, then detect the edges and contours of
its content. Finally, it will look at the whole object before producing a final guess about what the
model “sees.”
3. Binary classifiers — two classes (i.e. “Eiffel Tower” or “Not Eiffel Tower”)
We can build an image recognition model using traditional statistical approaches such as using
Support Vector Machines or Decision Trees, but the state-of-the-art method is with Neural
Recommender systems :
Recommender systems are machine learning systems that help users discover new product and
services.
Recommender systems are an essential feature in our digital world, as users are often
overwhelmed by choice and need help finding what they're looking for.
A Recommender System refers to a system that is capable of predicting the future preference of
a set of items for a user, and recommend the top items.
An important component of any of these systems is the recommender function, which takes
information about the user and predicts the rating that user might assign to a product.
Collaborative methods for recommender systems are methods that are based solely on the past
interactions recorded between users and items in order to produce new recommendations.
These interactions are stored in the so-called “user-item interactions matrix”.
Content based approaches use additional information about users and/or items. The idea of
content based methods is to try to build a model, based on the available “features”, that explain
the observed user-item interactions and also need to look at the profile (age, sex, …) of this user
and, based on this information, to determine relevant movies to suggest.
In order to build SNA graphs, two key components are required: actors and relationships. A common
application of SNA techniques is with the internet. Web pages on the internet often link to other
webpages — either on their own website or another website. These links can be considered
relationships between actors (web pages). This is actually a key component of search engine
architecture.
A social network graph contains both points and lines connecting those dots — similar to a connect-
the-dot puzzle. The points represent the actors and the lines represent the relationships. An
example of a social network graph can be seen below
Like many things in data science, there a variety of tools you can use to conduct SNA.
Gephi
This guide will use Gephi, a free software for Mac, PC, and Linux, in order to build network graphs
and run some analytics on them. Gephi provides a GUI interface and will not require any coding to
use.
Python/Excel
In order to build network graphs in Gephi, a specific data format must be used. In order to fit our
data into the correct format a tool must be used to create CSV files. With simple data, Excel should
suffice. However, when using large amounts of data or data that must have its relationships
extracted it is recommended to use Python. Don’t fret if you do not have any Python skills — you
should still be able to build some basic networks.
Data Source
You will also need a data source for your network. Network data have two requirements: actors and
relationships. Some data will require these relationships to be extracted, and others it will be more
explicit in the dataset. I recommend using datasets from Kaggle to get started.
Terminology of SNG:
In network science, actors are referred to as nodes (the dots on the graph) and relationships as
edges (the lines on the graph).
Edge Direction
There are two types of edges: directed and undirected.
Directed edges are applied from one node to another with a starting node and an ending
node.
Edge Weight
An Edge’s weight is the number of times that edge appears between two specific nodes.
Degree
Network Size:
Network size is the number of nodes in the network. The size of a network does not take into
consideration the number of edges.
Network Density
Network density is the number of edges divided by the total possible edges. For example, a network
with Node A connected to Node B, and Node B connected to Node C, the network density is 2/3
because there are two edges out of a possible 3.
Length
Length is the number of edges between the starting and ending nodes, known as hops. In order
to calculate the length between two nodes, a path must be predetermined.
Distance
Distance is the number of edges or hops between the starting and ending nodes following the
shortest path. Unlike length, the distance between two nodes uses only the shortest path — the
path that requires the least hops.
Implementation
Now that you have an understanding of social network analysis terms and concepts, applying
these techniques to a dataset using the Gephi software.
Dataset
we are using the Marvel Universe Social Network dataset from Kaggle. After downloading the
dataset, there will be three csv files: nodes, edges, and network. Open the file nodes.csv in
Excel.
Congrats! You have just imported the node and edge lists! In the data library, you can switch
your view between these two lists by clicking on Nodes or Edges in the top left-hand corner.
Now that the data has been imported it is time to view the graph. Click on the overview tab.