0% found this document useful (0 votes)
31 views20 pages

Deep Learning UNIT-3

The document outlines the fundamentals of neural networks, including their anatomy, types of layers, and the use of frameworks like Keras, TensorFlow, and Theano for deep learning. It discusses the process of setting up a deep learning workstation, the importance of loss functions and optimizers, and provides a practical example of classifying movie reviews using a neural network. Additionally, it covers best practices for developing models and running deep learning jobs in the cloud.

Uploaded by

hw7856330
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

Deep Learning UNIT-3

The document outlines the fundamentals of neural networks, including their anatomy, types of layers, and the use of frameworks like Keras, TensorFlow, and Theano for deep learning. It discusses the process of setting up a deep learning workstation, the importance of loss functions and optimizers, and provides a practical example of classifying movie reviews using a neural network. Additionally, it covers best practices for developing models and running deep learning jobs in the cloud.

Uploaded by

hw7856330
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

NARASARAOPETA INSTITUTE OF TECHNOLOGY

Department of Computer Science and Engineering


DEEP LEARNING (III – AI&ML) – II SEM
UNIT III

Neural Networks: Anatomy of Neural Network, Introduction to Keras: Keras, TensorFlow,


Theano and CNTK, Setting up Deep Learning Workstation, Classifying Movie Reviews:
Binary Classification, Classifying newswires: Multiclass Classification.

Anatomy of a neural network:

The anatomy of a neural network involves understanding its fundamental components, including
layers, neurons, weights, biases, and activation functions.

Layers, which are combined into a network (or model)

➢ The input data and corresponding targets.


➢ The loss function, which defines the feedback signal used for learning.
➢ The optimizer, which determines how learning proceeds.
Relationship between the network, layers, loss function, and optimizer:

1. Layers: the building blocks of deep learning


Each layer in a neural network performs a specific computation on the input data and transforms
it in some way.
Densely Connected Layers (Dense Layers):
• Appropriate Data Type: Simple vector data stored in 2D tensors of shape (samples,
features).

UNIT-3 [1] Dr.R.Satheeskumar, Professor


• Description: Each neuron in a dense layer is connected to every neuron in the previous
layer.
Recurrent Layers (e.g., LSTM, GRU):
• Appropriate Data Type: Sequence data stored in 3D tensors of shape (samples, timesteps,
features).
• Description: Recurrent layers are designed to process sequential data with temporal
dependencies.
Convolutional Layers (Conv2D, Convolutional Neural Networks):
• Appropriate Data Type: Image data stored in 4D tensors of shape (samples, height, width,
channels).
• Description: Convolutional layers are specifically designed for processing grid-like data
such as images.

The code you provided creates a Dense layer in Keras with 32 units (neurons) and specifies an
input shape of (784,)
• The input data should have a shape of (784,), which means it should be a 1D array with
784 elements.

This code creates a Sequential model with two Dense layers, each with 32 neurons. The first
layer specifies an input shape of (784,), while the second layer does not specify an input shape,
allowing it to automatically infer the input shape from the output shape of the previous layer.
2. Models: networks of layers
In deep learning, a model is typically represented as a directed, acyclic graph (DAG) of layers.
Network topologies (hypothesis space) that serve different purposes and address different types
of tasks
a) Two-branch networks
b) Multihead networks
c) Inception blocks

UNIT-3 [2] Dr.R.Satheeskumar, Professor


a) Two-branch networks
These architectures involve splitting the input data into two separate branches, each processed by
its own set of layers, and then merging the outputs before producing the final output.

b) Multihead networks:
In multihead networks, the model has multiple output branches (heads), each producing its own
output prediction.

c) Inception blocks:
These blocks consist of multiple parallel convolutional branches with different filter sizes or
operations (e.g., 1x1, 3x3, 5x5 convolutions), followed by concatenation or merging of their
outputs.

UNIT-3 [3] Dr.R.Satheeskumar, Professor


3. Loss functions and optimizers: keys to configuring the learning process:
Loss Functions:
A loss function quantifies how well the model's predictions match the true labels or targets. It
computes the difference between the predicted output and the actual output for a given input
sample.

Optimizers:
An optimizer is an algorithm that adjusts the weights of the neural network based on the
gradients of the loss function with respect to the weights. It determines how the weights are
updated during the training process.

A neural network that has multiple outputs may have multiple loss functions (one per output).
But the gradient-descent process must be based on a single scalar loss value; so, for multiloss
networks, all losses are combined (via averaging) into a single scalar quantity.

UNIT-3 [4] Dr.R.Satheeskumar, Professor


TensorFlow:
TensorFlow is a Google product, which is one of the most famous deep learning tools widely
used in the research area of machine learning and deep neural network.

➢ It came into the market on 9th November 2015 under the Apache License 2.0.
➢ It is built in such a way that it can easily run on multiple CPUs and GPUs as well as on
mobile operating systems.
➢ It consists of various wrappers in distinct languages such as Java, C++, or Python.

Type:
Type describes the data type assigned to Tensor’s elements.
A user needs to consider the following activities for building a Tensor −
➢ Build an n-dimensional array
➢ Convert the n-dimensional array.

➢ TensorFlow provides a flexible system of Tensor operations, which enables users to build
computations as graphs. The modular architecture allows users to use parts of the
TensorFlow graph across various components of an application.

➢ TensorBoard is a visualization tool included in TensorFlow that allows users to visualize


the graph and various other metrics like weights and biases during training.

UNIT-3 [5] Dr.R.Satheeskumar, Professor


Theano:
It is an open-source python library that is widely used for performing mathematical operations on
multi-dimensional arrays by incorporating scipy and numpy.

It utilizes GPUs for faster computation and efficiently computes the gradients by building
symbolic graphs automatically. It has come out to be very suitable for unstable expressions, as it
first observes them numerically and then computes them with more stable algorithms.
Using Theano:

➢ define expression

➢ compile expression

➢ execute expression

CNTK

➢ Microsoft Cognitive Toolkit is deep learning's open-source framework. It consists of all


the basic building blocks, which are required to form a neural network.

➢ The models are trained using C++ or Python, but it incorporates C# or Java to load the model
for making predictions.
➢ We can implement CNN, FNN, RNN, Batch Normalisation and Sequence-to-Sequence with
attention.
➢ It provides us the functionality to add new user-defined core-components on the GPU from
Python.
➢ It also provides automatic hyperparameter tuning.
➢ We can implement Reinforcement learning, Generative Adversarial Networks (GANs),
Supervised as well as Unsupervised learning.
➢ For massive datasets, CNTK has built-in optimised readers.

UNIT-3 [6] Dr.R.Satheeskumar, Professor


Developing with Keras:

1. Define your training data: input tensors and target tensors.


2. Define a network of layers (or model ) that maps your inputs to your targets.
3. Configure the learning process by choosing a loss function, an optimizer, and some
metrics to monitor.
4. Iterate on your training data by calling the fit() method of your model

There are two ways to define a model: using the Sequential class (only for linear stacks of layers,
which is the most common network architecture by far) or the functional API (for directed
acyclic graphs of layers, which lets you build completely arbitrary architectures).

The learning process is configured in the compilation step, where you specify the optimizer and
loss function(s) that the model should use, as well as the metrics you want to monitor during
training. Here’s an example with a single loss function, which is by far the most common case:

Finally, the learning process consists of passing Numpy arrays of input data (and the
corresponding target data) to the model via the fit() method,

UNIT-3 [7] Dr.R.Satheeskumar, Professor


Setting up a deep-learning workstation

➢ NVIDIA GPUs are preferred for deep learning tasks.


➢ Intel Core i7 or i9, or AMD Ryzen 7 or 9 processors are recommended.
➢ Deep learning models often require large amounts of memory, especially when dealing with
large datasets.
➢ Aim for at least 16GB to 32GB of RAM, or more if working with extremely large models or
datasets.
➢ Linux distributions like Ubuntu, CentOS, or Fedora are commonly used for deep learning
due to their stability and compatibility with various libraries and frameworks.
➢ Alternatively, Windows with WSL (Windows Subsystem for Linux) can also be used.

Deep Learning Frameworks:

• Install frameworks like TensorFlow, PyTorch, or Keras based on your preferences and
project requirements.

Development Environment:

• Set up an integrated development environment (IDE) such as Jupyter Notebook,


JupyterLab, or VSCode for writing and running code.
• Install necessary libraries and packages using package managers like pip or conda.

Security:

• Implement security measures such as firewalls, antivirus software, and regular software
updates to protect your workstation from cyber threats.

1. Jupyter notebooks: the preferred way to run deep-learning experiments:

• Jupyter Notebooks are indeed a popular choice for running deep learning experiments.
• Jupyter Notebooks allow you to write and execute code in a cell-by-cell fashion, enabling
interactive development.
• Jupyter Notebooks support inline plotting, which allows you to visualize data, model
architectures, training curves, and other metrics directly within the notebook.

UNIT-3 [8] Dr.R.Satheeskumar, Professor


• Jupyter Notebooks combine code, text, and visualizations in a single document, making it
easy to document your experiments, including explanations of algorithms, code
comments, observations, and insights.
• Jupyter Notebooks seamlessly integrate with popular deep learning frameworks like
TensorFlow, PyTorch, and Keras, allowing you to import these libraries

2. Getting Keras running: two options:

Option 1: Use the Official EC2 Deep Learning AMI:

➢ Launch an EC2 instance using the Deep Learning AMI provided by AWS.
➢ Choose an instance type with appropriate CPU and memory resources for your
experiments.
➢ Install and configure Jupyter Notebooks on the EC2 instance.
➢ Start a Jupyter server and access it through your web browser.

Option 2: Install Everything on a Local Unix Workstation:

➢ Install the necessary dependencies including Python, TensorFlow, Keras, and other
libraries on your local Unix workstation.
➢ Once everything is set up, you can choose to run your Keras experiments either as
Jupyter notebooks or as a regular Python codebase directly on your local machine.

3. Running deep-learning jobs in the cloud: pros and cons:


Pros:
1. Scalability
2. Cost Efficiency
3. Ease of Use
4. Access to Specialized Hardware
5. Global Availability
Cons:
1. Cost Overruns
2. Data Privacy and Security Concerns
3. Dependence on Internet Connectivity
4. Vendor Lock-In
5. Limited Control over Infrastructure

UNIT-3 [9] Dr.R.Satheeskumar, Professor


4. What is the best GPU for deep learning?
➢ NVIDIA GeForce RTX 3090: Known for its high performance and large VRAM (24GB
GDDR6X), the RTX 3090 is favored for deep learning tasks requiring large datasets and
complex models.
➢ NVIDIA GeForce RTX 3080: RTX 3080 offers a good balance of performance and price.
➢ NVIDIA GeForce RTX 3070
➢ NVIDIA A100 Tensor Core GPU
➢ AMD Radeon Instinct MI100
When choosing a GPU for deep learning, it's essential to consider factors such as performance
requirements, budget constraints, compatibility with deep learning frameworks (e.g.,
TensorFlow, PyTorch), and availability of drivers and software support.

Classifying movie reviews: a binary classification example


To classify movie reviews as positive or negative, based on the text content of the reviews.
1. The IMDB dataset:
➢ A set of 50,000 highly polarized reviews from the Internet Movie Database. They’re split
into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of
50% negative and 50% positive reviews.
➢ The IMDB dataset comes packaged with Keras

➢ Imports the imdb module from Keras’s datasets collection.


➢ load_data() Function: This is a method that loads the dataset into memory.
➢ num_words=10000: This argument is particularly important as it specifies the maximum
number of the most frequent words to keep in the training data. Here, num_words=10000
means that only the top 10,000 most frequent words in the dataset will be kept.
The variables train_data and test_data are lists of reviews; each review is a list of word indices
(encoding a sequence of words). train_labels and test_labels are lists of 0s and 1s, where 0 stands
for negative and 1 stands for positive:

UNIT-3 [10] Dr.R.Satheeskumar, Professor


here’s how you can quickly decode one of these reviews back to English words:

2. Preparing the data:


➢ Tokenization: Convert the raw text into a sequence of integers. This involves assigning a
unique integer index to each word in the dataset.
➢ Padding: Ensure that all sequences have the same length. This is necessary because
neural networks require inputs of fixed dimensions. Sequences shorter than the specified
length are padded with zeros, while longer sequences are truncated.
➢ Data Splitting: Split the dataset into training and validation sets. The training set is used
to train the model, while the validation set is used to evaluate the model's performance
during training.
The code you provided defines a function vectorize_sequences that converts sequences aof
integers into a binary matrix representation suitable for input into a neural network.

UNIT-3 [11] Dr.R.Satheeskumar, Professor


Each row corresponds to a sequence, and each column corresponds to a word in the vocabulary.
If a word is present in a sequence, the corresponding element in the row is set to 1; otherwise, it
remains 0.
x_train and x_test will be binary matrix representations of the training and test data, respectively,
with each row representing a movie review and each column representing a word in the
vocabulary.

3. Building your network:


Building a neural network involves defining the architecture of the network, including the
number and type of layers, activation functions, and other parameters.

UNIT-3 [12] Dr.R.Satheeskumar, Professor


• import the models and layers modules from the Keras library.
• The Sequential model is a linear stack of layers where each layer has exactly one input
tensor and one output tensor.
• The Sequential model is a linear stack of layers where each layer has exactly one input
tensor and one output tensor.
• “16”: This specifies the number of units (neurons) in the layer.
• ReLU (Rectified Linear Unit) activation function.
• The input shape is a one-dimensional array with 10,000 elements.
• Adds another Dense layer to the model with similar configurations as the previous layer.
• The output layer typically has one neuron per class in classification tasks. The activation
function used here is sigmoid, which squashes the output between 0 and 1,
Finally, you need to choose a loss function and an optimizer:
• After defining the architecture of the neural network, you need to specify the loss
function and optimizer to train the model.
binary_crossentropy loss, mean_squared_error

• RMSprop (Root Mean Square Propagation) is a popular choice of optimizer for neural
networks.
• This specifies the loss function to use. Since you're performing binary classification
(predicting either 0 or 1).
• accuracy', which measures the proportion of correctly classified samples.
4. Validating your approach
To validate the approach of building and training the neural network model, Split the Data:
Divide your dataset into training and validation sets.

UNIT-3 [13] Dr.R.Satheeskumar, Professor


Let’s set apart 1,000 samples in the training data to use as a validation set.
Setting aside a validation set:
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
Now, let’s train the network for 20 epochs:
Training the model:
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val))
And finally, let’s display its loss and accuracy curves
Plotting the training and validation loss:
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Plotting the training and validation accuracy:
plt.clf()
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs, acc, 'bo', label='Training acc')
UNIT-3 [14] Dr.R.Satheeskumar, Professor
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

UNIT-3 [15] Dr.R.Satheeskumar, Professor


5. Generating predictions on new data
The predict method of the model instance returns a probability distribution over all 46
topics. Let’s generate topic predictions for all of the test data.
predictions = model.predict(x_test)
Each entry in predictions is a vector of length 46:
>>> predictions[0].shape
(46,)
The coefficients in this vector sum to 1:
>>> np.sum(predictions[0])
1.0
The largest entry is the predicted class—the class with the highest probability:
>>> np.argmax(predictions[0])
4

Classifying newswires: a multiclass classification example


➢ Classifying newswires typically refers to the process of categorizing news articles or
stories into different topics or themes.
➢ To build a network to classify Reuters newswires into 46 mutually exclusive topics.
Because you have many classes, this problem is an instance of multiclass classification;
and because each data point should be classified into only one category. The problem is
more specifically an instance of single-label, multiclass classification.
➢ Common topics for classifying newswires include Politics, Sports, Business, Technology,
Entertainment, Health, Science, Environment, Education, Lifestyle,etc...
The Reuters dataset:
Reuters dataset, a set of short newswires and their topics, published by Reuters in 1986. It’s a
simple, widely used toy dataset for text classification. There are 46 different topics; some topics
are more represented than others, but each topic has at least 10 examples in the training set.

UNIT-3 [16] Dr.R.Satheeskumar, Professor


The label associated with an example is an integer between 0 and 45—a topic index:
>>> train_labels[10]
3
2 Preparing the data
Vectorize the data with the exact same code:
Encoding the data:

To vectorize the labels, there are two possibilities: you can cast the label list as an integer tensor,
or you can use one-hot encoding. One-hot encoding is a widely used format for categorical data,
also called categorical encoding.

UNIT-3 [17] Dr.R.Satheeskumar, Professor


one-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in
the place of the label index. Here’s an example:

3. Building your network:


Implementation of a neural network using Keras looks solid for a multiclass classification
problem, such as classifying newswire topics into one of 46 different classes. This model
structure should help mitigate the risk of creating information bottlenecks that you correctly
identified can occur with smaller hidden layers.

Input Layer:
input_shape=(10000,) specifies that each input sample has 10,000 features. This is consistent
with your vectorized input data, where each feature corresponds to one term in a large
vocabulary (assuming a one-hot encoded vector of vocabulary size 10,000).
The first Dense layer has 64 units, which is a significant increase from a hypothetical smaller
design, reducing the likelihood that this layer acts as a bottleneck.
Hidden Layer:
Another Dense layer with 64 units and 'relu' activation. This adds depth to the network, allowing
it to learn more complex patterns in the data. The 'relu' activation function is used for introducing
non-linearity into the model, helping it to learn more complex relationships in the data.
Output Layer:
A Dense layer with 46 units corresponds to the 46 different output classes.
The 'softmax' activation function is used because it outputs a probability distribution over the 46
classes, which is suitable for multiclass classification. Each output will give the probability that
the input belongs to one of the 46 classes.

UNIT-3 [18] Dr.R.Satheeskumar, Professor


4. Validating your approach:
Let’s set apart 1,000 samples in the training data to use as a validation set.
Setting aside a validation set:

Training the model:

Plotting the training and validation loss:

UNIT-3 [19] Dr.R.Satheeskumar, Professor


Plotting the training and validation accuracy:

5. Generating predictions on new data:


You can verify that the predict method of the model instance returns a probability distribution
over all 46 topics. Let’s generate topic predictions for all of the test data.

UNIT-3 [20] Dr.R.Satheeskumar, Professor

You might also like