0% found this document useful (0 votes)
109 views21 pages

DL Unit 4 Notes

Keras is a high-level neural network API that runs on top of popular deep learning frameworks like TensorFlow and Theano. It provides easy to use but powerful tools for developing and training deep learning models. Keras allows users to quickly define models with minimal code using sequential or functional APIs. Models can then be compiled, fit to data, evaluated, and used to make predictions. Keras aims to simplify deep learning development while leveraging low-level frameworks for efficient execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views21 pages

DL Unit 4 Notes

Keras is a high-level neural network API that runs on top of popular deep learning frameworks like TensorFlow and Theano. It provides easy to use but powerful tools for developing and training deep learning models. Keras allows users to quickly define models with minimal code using sequential or functional APIs. Models can then be compiled, fit to data, evaluated, and used to make predictions. Keras aims to simplify deep learning development while leveraging low-level frameworks for efficient execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to Keras:

Deep learning is one of the major subfield of machine learning framework. Machine learning
is the study of design of algorithms, inspired from the model of human brain. Deep learning
is becoming more popular in data science fields like robotics, artificial intelligence(AI), audio
& video recognition and image recognition. Artificial neural network is the core of deep
learning methodologies. Deep learning is supported by various libraries such as Theano,
TensorFlow, Caffe, Mxnet etc., Keras is one of the most powerful and easy to use python
library, which is built on top of popular deep learning libraries like TensorFlow, Theano, etc.,
for creating deep learning models.

Overview of Keras

Keras runs on top of open source machine libraries like TensorFlow, Theano or Cognitive
Toolkit (CNTK). Theano is a python library used for fast numerical computation tasks.
TensorFlow is the most famous symbolic math library used for creating neural networks and
deep learning models. TensorFlow is very flexible and the primary benefit is distributed
computing. CNTK is deep learning framework developed by Microsoft. It uses libraries such
as Python, C#, C++ or standalone machine learning toolkits. Theano and TensorFlow are
very powerful libraries but difficult to understand for creating neural networks.

Keras is based on minimal structure that provides a clean and easy way to create deep
learning models based on TensorFlow or Theano. Keras is designed to quickly define deep
learning models. Well, Keras is an optimal choice for deep learning applications.

Features

Keras leverages various optimization techniques to make high level neural network API
easier and more performant. It supports the following features −

 Consistent, simple and extensible API.


 Minimal structure - easy to achieve the result without any frills.
 It supports multiple platforms and backends.
 It is user friendly framework which runs on both CPU and GPU.
 Highly scalability of computation.

Benefits

Keras is highly powerful and dynamic framework and comes up with the following
advantages −

 Larger community support.


 Easy to test.
 Keras neural networks are written in Python which makes things simpler.
 Keras supports both convolution and recurrent networks.
 Deep learning models are discrete components, so that, you can combine into many
ways.

Keras, TensorFlow, Theano, and CNTK:


Keras is compact, easy to learn, high-level Python library run on top of TensorFlow
framework. It is made with focus of understanding deep learning techniques, such as creating
layers for neural networks maintaining the concepts of shapes and mathematical details. The
creation of freamework can be of the following two types −

 Sequential API
 Functional API

Consider the following eight steps to create deep learning model in Keras −

 Loading the data


 Preprocess the loaded data
 Definition of model
 Compiling the model
 Fit the specified model
 Evaluate it
 Make the required predictions
 Save the model

We will use the Jupyter Notebook for execution and display of output as shown below −

Step 1 − Loading the data and preprocessing the loaded data is implemented first to execute
the deep learning model.

import warnings
warnings.filterwarnings('ignore')

import numpy as np
np.random.seed(123) # for reproducibility

from keras.models import Sequential


from keras.layers import Flatten, MaxPool2D, Conv2D, Dense, Reshape, Dropout
from keras.utils import np_utils
Using TensorFlow backend.
from keras.datasets import mnist

# Load pre-shuffled MNIST data into train and test sets


(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

This step can be defined as “Import libraries and Modules” which means all the libraries and
modules are imported as an initial step.
Step 2 − In this step, we will define the model architecture −

model = Sequential()
model.add(Conv2D(32, 3, 3, activation = 'relu', input_shape = (28,28,1)))
model.add(Conv2D(32, 3, 3, activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))

Step 3 − Let us now compile the specified model −

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

Step 4 − We will now fit the model using training data −

model.fit(X_train, Y_train, batch_size = 32, epochs = 10, verbose = 1)

The output of iterations created is as follows −

Epoch 1/10 60000/60000 [==============================] - 65s -


loss: 0.2124 -
acc: 0.9345
Epoch 2/10 60000/60000 [==============================] - 62s -
loss: 0.0893 -
acc: 0.9740
Epoch 3/10 60000/60000 [==============================] - 58s -
loss: 0.0665 -
acc: 0.9802
Epoch 4/10 60000/60000 [==============================] - 62s -
loss: 0.0571 -
acc: 0.9830
Epoch 5/10 60000/60000 [==============================] - 62s -
loss: 0.0474 -
acc: 0.9855
Epoch 6/10 60000/60000 [==============================] - 59s -
loss: 0.0416 -
acc: 0.9871
Epoch 7/10 60000/60000 [==============================] - 61s -
loss: 0.0380 -
acc: 0.9877
Epoch 8/10 60000/60000 [==============================] - 63s -
loss: 0.0333 -
acc: 0.9895
Epoch 9/10 60000/60000 [==============================] - 64s -
loss: 0.0325 -
acc: 0.9898
Epoch 10/10 60000/60000 [==============================] - 60s -
loss: 0.0284 -
acc: 0.9910
difference between Keras, TensorFlow, Theano, and CNTK:

Keras, TensorFlow, Theano, and CNTK are all related to deep learning and neural networks,
but they serve different purposes and have different roles in the machine learning ecosystem.

1. Theano:
o Theano was an open-source numerical computation library for Python.
o Developed by the Montreal Institute for Learning Algorithms (MILA) at the
University of Montreal.
o Provided a low-level interface for tensor operations, which allowed for
efficient computation on CPUs and GPUs.
o Theano is no longer actively developed or supported as of September 2017.
2. TensorFlow:
o An open-source machine learning framework developed by the Google Brain
team.
o Offers a flexible platform for building and deploying machine learning
models, including neural networks.
o Provides both high-level APIs (like Keras) for quick development and low-
level APIs for more fine-grained control.
o Supports distributed computing and deployment on various platforms.
o TensorFlow is widely used in both research and industry.
3. Keras:
o Originally developed as a high-level neural networks API written in Python.
o Designed to be user-friendly, modular, and extensible.
o In the past, Keras could be used with different backends, including
TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK).
o Since TensorFlow version 2.0, Keras has been integrated as the official high-
level API for TensorFlow, making it the default choice for most TensorFlow
users.
o Keras provides a simple and consistent interface for building and training
neural networks.
4. Microsoft Cognitive Toolkit (CNTK):
o Developed by Microsoft, CNTK is an open-source deep learning framework.
o Designed for efficient training and evaluation of deep neural networks.
o Supports both high-level and low-level APIs.
o CNTK is particularly known for its efficiency in handling large datasets and
complex neural network architectures.
o While CNTK was once an option as a backend for Keras, it is not as
commonly used as TensorFlow in the broader deep learning community.

In summary:

 TensorFlow is a comprehensive and widely used deep learning framework.


 Keras is a high-level neural networks API that has been integrated into TensorFlow.
 Theano is an older library that is no longer actively developed.
 CNTK is a deep learning framework developed by Microsoft, but it is not as
commonly used as TensorFlow.
What Are Recurrent Neural Networks (RNN)?

Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to
process sequences of data. They work especially well for jobs requiring sequences, such as
time series data, voice, natural language, and other activities.

RNN works on the principle of saving the output of a particular layer and feeding this back to
the input in order to predict the output of the layer.

Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:

Fig: Simple Recurrent Neural Network

The nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks. A, B, and C are the parameters of the network.

Fig: Fully connected Recurrent Neural Network

Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C
are the network parameters used to improve the output of the model. At any given time t, the
current input is a combination of input at x(t) and x(t-1). The output at any given time is
fetched back to the network to improve on the output.

Fig: Fully connected Recurrent Neural Network

Now that you understand what a recurrent neural network is let’s look at the different types of
recurrent neural networks.

Why Recurrent Neural Networks?

RNN were created because there were a few issues in the feed-forward neural network:

 Cannot handle sequential data


 Considers only the current input
 Cannot memorize previous inputs

The solution to these issues is the RNN. An RNN can handle sequential data, accepting the
current input data, and previously received inputs. RNNs can memorize previous inputs due
to their internal memory.

How Does Recurrent Neural Networks Work?

In Recurrent Neural networks, the information cycles through a loop to the middle hidden
layer.
Fig: Working of Recurrent Neural Network

The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto
the middle layer.

The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation
functions and weights and biases. If you have a neural network where the various parameters
of different hidden layers are not affected by the previous layer, ie: the neural network does
not have memory, then you can use a recurrent neural network.

The Recurrent Neural Network will standardize the different activation functions and weights
and biases so that each hidden layer has the same parameters. Then, instead of creating
multiple hidden layers, it will create one and loop over it as many times as required.

Feed-Forward Neural Networks vs Recurrent Neural Networks

A feed-forward neural network allows information to flow only in the forward direction, from
the input nodes, through the hidden layers, and to the output nodes. There are no cycles or
loops in the network.

Below is how a simplified presentation of a feed-forward neural network looks like:

Fig: Feed-forward Neural Network


In a feed-forward neural network, the decisions are based on the current input. It doesn’t
memorize the past data, and there’s no future scope. Feed-forward neural networks are used
in general regression and classification problems.

Applications of Recurrent Neural Networks

RNNs are used to caption an image by analyzing the activities present.

Time Series Prediction

Any time series problem, like predicting the prices of stocks in a particular month, can be
solved using an RNN.

Natural Language Processing

Text mining and Sentiment analysis can be carried out using an RNN for Natural Language
Processing (NLP).

Machine Translation

Given an input in one language, RNNs can be used to translate the input into different
languages as output.
Advantages of Recurrent Neural Network

Recurrent Neural Networks (RNNs) have several advantages over other types of neural
networks, including:

Ability To Handle Variable-Length Sequences

RNNs are designed to handle input sequences of variable length, which makes them well-
suited for tasks such as speech recognition, natural language processing, and time series
analysis.

Memory Of Past Inputs

RNNs have a memory of past inputs, which allows them to capture information about the
context of the input sequence. This makes them useful for tasks such as language modeling,
where the meaning of a word depends on the context in which it appears.

Parameter Sharing

RNNs share the same set of parameters across all time steps, which reduces the number of
parameters that need to be learned and can lead to better generalization.

Non-Linear Mapping

RNNs use non-linear activation functions, which allows them to learn complex, non-linear
mappings between inputs and outputs.

Sequential Processing

RNNs process input sequences sequentially, which makes them computationally efficient and
easy to parallelize.

Flexibility

RNNs can be adapted to a wide range of tasks and input types, including text, speech, and
image sequences.

Improved Accuracy

RNNs have been shown to achieve state-of-the-art performance on a variety of sequence


modeling tasks, including language modeling, speech recognition, and machine translation.

These advantages make RNNs a powerful tool for sequence modeling and analysis, and have
led to their widespread use in a variety of applications, including natural language processing,
speech recognition, and time series analysis.

Disadvantages of Recurrent Neural Network

Although Recurrent Neural Networks (RNNs) have several advantages, they also have some
disadvantages. Here are some of the main disadvantages of RNNs:
Vanishing And Exploding Gradients

RNNs can suffer from the problem of vanishing or exploding gradients, which can make it
difficult to train the network effectively. This occurs when the gradients of the loss function
with respect to the parameters become very small or very large as they propagate through
time.

Computational Complexity

RNNs can be computationally expensive to train, especially when dealing with long
sequences. This is because the network has to process each input in sequence, which can be
slow.

Difficulty In Capturing Long-Term Dependencies

Although RNNs are designed to capture information about past inputs, they can struggle to
capture long-term dependencies in the input sequence. This is because the gradients can
become very small as they propagate through time, which can cause the network to forget
important information.

Lack Of Parallelism

RNNs are inherently sequential, which makes it difficult to parallelize the computation. This
can limit the speed and scalability of the network.

Difficulty In Choosing The Right Architecture

There are many different variants of RNNs, each with its own advantages and disadvantages.
Choosing the right architecture for a given task can be challenging, and may require
extensive experimentation and tuning.

Difficulty In Interpreting The Output

The output of an RNN can be difficult to interpret, especially when dealing with complex
inputs such as natural language or audio. This can make it difficult to understand how the
network is making its predictions.

These disadvantages are important when deciding whether to use an RNN for a given task.
However, many of these issues can be addressed through careful design and training of the
network and through techniques such as regularization and attention mechanisms.

Types of Recurrent Neural Networks

There are four types of Recurrent Neural Networks:

1. One to One
2. One to Many
3. Many to One
4. Many to Many
One to One RNN

This type of neural network is known as the Vanilla Neural Network. It's used for general
machine learning problems, which has a single input and a single output.

One to Many RNN

This type of neural network has a single input and multiple outputs. An example of this is the
image caption.

Many to One RNN

This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a
good example of this kind of network where a given sentence can be classified as expressing
positive or negative sentiments.
Many to Many RNN

This RNN takes a sequence of inputs and generates a sequence of outputs. Machine
translation is one of the examples.

Two Issues of Standard RNNs

1. Vanishing Gradient Problem

Recurrent Neural Networks enable you to model time-dependent and sequential data
problems, such as stock market prediction, machine translation, and text generation. You will
find, however, RNN is hard to train because of the gradient problem.

RNNs suffer from the problem of vanishing gradients. The gradients carry information used
in the RNN, and when the gradient becomes too small, the parameter updates become
insignificant. This makes the learning of long data sequences difficult.

2. Exploding Gradient Problem

While training a neural network, if the slope tends to grow exponentially instead of decaying,
this is called an Exploding Gradient. This problem arises when large error gradients
accumulate, resulting in very large updates to the neural network model weights during the
training process.
Long training time, poor performance, and bad accuracy are the major issues in gradient
problems.

Gradient Problem Solutions

Now, let’s discuss the most popular and efficient way to deal with gradient problems,
i.e., Long Short-Term Memory Network (LSTMs).

First, let’s understand Long-Term Dependencies.

Suppose you want to predict the last word in the text: “The clouds are in the ______.”

The most obvious answer to this is the “sky.” We do not need any further context to predict
the last word in the above sentence.

Consider this sentence: “I have been staying in Spain for the last 10 years…I can speak fluent
______.”

The word you predict will depend on the previous few words in context. Here, you need the
context of Spain to predict the last word in the text, and the most suitable answer to this
sentence is “Spanish.” The gap between the relevant information and the point where it's
needed may have become very large. LSTMs help you solve this problem.

Common Activation Functions

Recurrent Neural Networks (RNNs) use activation functions just like other neural networks
to introduce non-linearity to their models. Here are some common activation functions used
in RNNs:

Sigmoid Function:

The sigmoid function is commonly used in RNNs. It has a range between 0 and 1, which
makes it useful for binary classification tasks. The formula for the sigmoid function is:

σ(x) = 1 / (1 + e^(-x))
Hyperbolic Tangent (Tanh) Function:

The tanh function is also commonly used in RNNs. It has a range between -1 and 1, which
makes it useful for non-linear classification tasks. The formula for the tanh function is:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Rectified Linear Unit (Relu) Function:

The ReLU function is a non-linear activation function that is widely used in deep neural
networks. It has a range between 0 and infinity, which makes it useful for models that require
positive outputs. The formula for the ReLU function is:

ReLU(x) = max(0, x)

Leaky Relu Function:

The Leaky ReLU function is similar to the ReLU function, but it introduces a small slope to
negative values, which helps to prevent "dead neurons" in the model. The formula for the
Leaky ReLU function is:

Leaky ReLU(x) = max(0.01x, x)

Softmax Function:

The softmax function is often used in the output layer of RNNs for multi-class classification
tasks. It converts the network output into a probability distribution over the possible classes.
The formula for the softmax function is:

softmax(x) = e^x / ∑(e^x)

These are just a few examples of the activation functions used in RNNs. The choice of
activation function depends on the specific task and the model's architecture.

Backpropagation Through Time

Backpropagation through time is when we apply a Backpropagation algorithm to a Recurrent


Neural network that has time series data as its input.

In a typical RNN, one input is fed into the network at a time, and a single output is obtained.
But in backpropagation, you use the current as well as the previous inputs as input. This is
called a timestep and one timestep will consist of many time series data points entering the
RNN simultaneously.

Once the neural network has trained on a timeset and given you an output, that output is used
to calculate and accumulate the errors. After this, the network is rolled back up and weights
are recalculated and updated keeping the errors in mind.

Variant RNN Architectures


There are several variant RNN architectures that have been developed over the years to
address the limitations of the standard RNN architecture. Here are a few examples:

Long Short-Term Memory (LSTM) Networks

LSTM is a type of RNN that is designed to handle the vanishing gradient problem that can
occur in standard RNNs. It does this by introducing three gating mechanisms that control the
flow of information through the network: the input gate, the forget gate, and the output gate.
These gates allow the LSTM network to selectively remember or forget information from the
input sequence, which makes it more effective for long-term dependencies.

Gated Recurrent Unit (GRU) Networks

GRU is another type of RNN that is designed to address the vanishing gradient problem. It
has two gates: the reset gate and the update gate. The reset gate determines how much of the
previous state should be forgotten, while the update gate determines how much of the new
state should be remembered. This allows the GRU network to selectively update its internal
state based on the input sequence.

Bidirectional RNNs:

Bidirectional RNNs are designed to process input sequences in both forward and backward
directions. This allows the network to capture both past and future context, which can be
useful for speech recognition and natural language processing tasks.

Encoder-Decoder RNNs:

Encoder-decoder RNNs consist of two RNNs: an encoder network that processes the input
sequence and produces a fixed-length vector representation of the input and a decoder
network that generates the output sequence based on the encoder's representation. This
architecture is commonly used for sequence-to-sequence tasks such as machine translation.

Attention Mechanisms

Attention mechanisms are a technique that can be used to improve the performance of RNNs
on tasks that involve long input sequences. They work by allowing the network to attend to
different parts of the input sequence selectively rather than treating all parts of the input
sequence equally. This can help the network focus on the input sequence's most relevant parts
and ignore irrelevant information.

These are just a few examples of the many variant RNN architectures that have been
developed over the years. The choice of architecture depends on the specific task and the
characteristics of the input and output sequences.

Long Short-Term Memory Networks

LSTMs are a special kind of RNN — capable of learning long-term dependencies by


remembering information for long periods is the default behavior.
All RNN are in the form of a chain of repeating modules of a neural network. In standard
RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

Fig: Long Short Term Memory Networks

LSTMs also have a chain-like structure, but the repeating module is a bit different structure.
Instead of having a single neural network layer, four interacting layers are communicating
extraordinarily.

Workings of LSTMs in RNN


LSTMs work in a 3-step process.

Step 1: Decide How Much Past Data It Should Remember

The first step in the LSTM is to decide which information should be omitted from the cell in
that particular time step. The sigmoid function determines this. It looks at the previous state
(ht-1) along with the current input xt and computes the function.

Consider the following two sentences:

Let the output of h(t-1) be “Alice is good in Physics. John, on the other hand, is good at
Chemistry.”

Let the current input at x(t) be “John plays football well. He told me yesterday over the phone
that he had served as the captain of his college football team.”

The forget gate realizes there might be a change in context after encountering the first full
stop. It compares with the current input sentence at x(t). The next sentence talks about John,
so the information on Alice is deleted. The position of the subject is vacated and assigned to
John.

Step 2: Decide How Much This Unit Adds to the Current State

In the second layer, there are two parts. One is the sigmoid function, and the other is the tanh
function. In the sigmoid function, it decides which values to let through (0 or
1). tanh function gives weightage to the values which are passed, deciding their level of
importance (-1 to 1).
With the current input at x(t), the input gate analyzes the important information — John plays
football, and the fact that he was the captain of his college team is important.

“He told me yesterday over the phone” is less important; hence it's forgotten. This process of
adding some new information can be done via the input gate.

Step 3: Decide What Part of the Current Cell State Makes It to the Output

The third step is to decide what the output will be. First, we run a sigmoid layer, which
decides what parts of the cell state make it to the output. Then, we put the cell state through
tanh to push the values to be between -1 and 1 and multiply it by the output of the sigmoid
gate.

Let’s consider this example to predict the next word in the sentence: “John played
tremendously well against the opponent and won for his team. For his contributions, brave
____ was awarded player of the match.”

There could be many choices for the empty space. The current input brave is an adjective,
and adjectives describe a noun. So, “John” could be the best output after brave.

LSTM Use Case

Now that you understand how LSTMs work, let’s do a practical implementation to predict the
prices of stocks using the “Google stock price” data.

Based on the stock price data between 2012 and 2016, we will predict the stock prices of
2017.

1. Import the required libraries


2. Import the training dataset

3. Perform feature scaling to transform the data

4. Create a data structure with 60-time steps and 1 output

5. Import Keras library and its packages

6. Initialize the RNN

7. Add the LSTM layers and some dropout regularization.


8. Add the output layer.

9. Compile the RNN

10. Fit the RNN to the training set

11. Load the stock price test data for 2017

12. Get the predicted stock price for 2017

13. Visualize the results of predicted and real stock price

You might also like