0% found this document useful (0 votes)
16 views24 pages

Chapter 3

Uploaded by

kasom39003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views24 pages

Chapter 3

Uploaded by

kasom39003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Deep Learning

Dr. Muhammad Aqib


University Institute of Information Technology
PMAS-Arid Agriculture University Rawalpindi
Getting Started with Neural
2
Networks

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Contents
3

 Anatomy of neural network


 Layers: the building blocks of deep learning
 Models: Networks of layers
 Loss function and optimizers
 Introduction to Keras
 Keras, TensorFlow, Theano, and CNTK
 Developing with Keras: A Quick Overview
 Setting up a deep learning workstation
 Classifying movie reviews: A binary classification example
 Predicting house price prediction: A regression example

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Anatomy of Neural Network
4

 Training a neural network revolves around the following objects:


 Layers, which are combined into a network (or model).
 The input data and corresponding targets.
 The loss function, which defines the feedback signal used for learning.
 The optimizer, which determines how learning proceeds.
 The network, composed of layers that are chained together, maps the input
data to predictions.
 The loss function then compares these predictions to the targets, producing a
loss value.
 A measure of how well the network’s predictions match what was expected.
 The optimizer uses this loss value to update the network’s weights.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Anatomy of Neural Network (cont.)
5

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Layers: The Building Blocks of Deep Learning
6

 The fundamental data structure in neural networks is the layer.


 A layer is a data-processing module that takes as input one or more tensors
and that outputs one or more tensors.
 Some layers are stateless, but more frequently layers have a state i.e., the
layer’s weights.
 Different layers are appropriate for different tensor formats and different types
of data processing.
 Simple vector data, stored in 2D tensors of shape (samples, features), is often
processed by densely connected layers, also called fully connected or dense
layers (the Dense class in Keras).
 Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is
typically processed by recurrent layers such as an LSTM layer.
 Image data, stored in 4D tensors, is usually processed by 2D convolution layers
(Conv2D).

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Layers: The Building Blocks of Deep Learning (cont.)
7
 You can think of layers as the LEGO bricks of deep learning, a metaphor that
is made explicit by frameworks like Keras.
 Building deep-learning models in Keras is done by clipping together
compatible layers to form useful data-transformation pipelines.
 The notion of layer compatibility here refers specifically to the fact that every
layer will only accept input tensors of a certain shape and will return output
tensors of a certain shape.
 Consider the following example:

 We’re creating a layer that will only accept as input 2D tensors where the first
dimension is 784 (axis 0, the batch dimension, is unspecified, and thus any
value would be accepted).

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Layers: The Building Blocks of Deep Learning (cont.)
8
 This layer can only be connected to a downstream layer that expects 32-
dimensional vectors as its input.
 When using Keras, you don’t have to worry about compatibility, because the
layers you add to your models are dynamically built to match the shape of the
incoming layer.
 For instance, suppose you write the following:

 The second layer didn’t receive an input shape argument—instead, it


automatically inferred its input shape as being the output shape of the layer that
came before.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Models: Networks of Layers
9

 A deep-learning model is a directed, acyclic graph of layers. The most


common instance is a linear stack of layers, mapping a single input to a single
output.
 Some network topologies are,
 Two-branch networks
 Multihead networks
 Inception blocks
 The topology of a network defines a hypothesis space.
 By choosing a network topology, you constrain your space of possibilities
(hypothesis space) to a specific series of tensor operations, mapping input
data to output data.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Models: Networks of Layers (cont.)
10

 We look for a good set of values for the weight tensors involved in these tensor
operations.

 Picking the right network architecture is more an art than a science, and
although there are some best practices and principles you can rely on, only
practice can help you become a proper neural-network architect.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Loss Functions and Optimizers
11

 Once the network architecture is defined, you still need to choose two things.
 Loss function (objective function): The quantity that will be minimized during
training. It represents a measure of success for the task at hand.
 Optimizer: Determines how the network will be updated based on the loss function.
It implements a specific variant of stochastic gradient descent (SGD).

 A neural network that has multiple outputs may have multiple loss functions
(one per output).

 But the gradient-descent process must be based on a single scalar loss value.
So, for multi-loss networks, all losses are combined (via averaging) into a single
scalar quantity.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Loss Functions and Optimizers (cont.)
12

 Choosing the right objective function for the right problem is extremely
important:
 If the objective doesn’t fully correlate with success for the task at hand, your
network will end up doing things you may not have wanted.
 Imagine a stupid, omnipotent AI trained via SGD, with this poorly chosen objective
function: “maximizing the average well-being of all humans alive.”
 To make its job easier, this AI might choose to kill all humans except a few and
focus on the well-being of the remaining ones. Because average well-being isn’t
affected by how many humans are left.
 When it comes to problems such as classification, regression, and sequence
prediction, there are simple guidelines to choose the correct loss.
 For instance, you’ll use binary crossentropy for a two-class classification
problem, categorical crossentropy for a many-class classification problem,
meansquared error for a regression problem, connectionist temporal
classification (CTC) for a sequence-learning problem, and so on.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Introduction to Keras
13

 Keras is a deep-learning framework for Python that provides a convenient


way to define and train almost any kind of deep-learning model.

 Keras has the following key features:


 It allows the same code to run seamlessly on CPU or GPU.
 It has a user-friendly API that makes it easy to quickly prototype deep-learning
models.
 It has built-in support for convolutional networks (for computer vision), recurrent
networks (for sequence processing), and any combination of both.
 It supports arbitrary network architectures including multi-input or multi-output
models, layer sharing, model sharing, and so on.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Introduction to Keras (cont.)
14

 Keras is distributed under the permissive MIT license, which means it can be
freely used in commercial projects.

 It’s compatible with any version of Python from 2.7 to 3.X.

 Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and hundreds of
startups working on a wide range of problems.

 Keras is also a popular framework on Kaggle, the machine-learning


competition website, where almost every recent deep-learning competition
has been won using Keras models.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Introduction to Keras (cont.)
15

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Keras, TensorFlow, Theano, and CNTK
16

 Keras is a model-level library, providing high-level building blocks for


developing deep-learning models.
 It doesn’t handle low-level operations such as tensor manipulation and
differentiation.
 Instead, it relies on a specialized, well-optimized tensor library to do so, serving
as the backend engine of Keras.
 Rather than choosing a single tensor library and tying the implementation of
Keras to that library, Keras handles the problem in a modular way, thus several
different backend engines can be plugged seamlessly into Keras.
 Currently, the three existing backend implementations are the TensorFlow
backend, the Theano backend, and the Microsoft Cognitive Toolkit (CNTK)
backend.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Keras, TensorFlow, Theano, and CNTK (cont.)
17

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Keras, TensorFlow, Theano, and CNTK (cont.)
18

 TensorFlow, CNTK, and Theano are some of the primary platforms for deep
learningtoday.
 Theano (http://deeplearning.net/software/theano) is developed by the MILA
lab at Université de Montréal
 TensorFlow (www.tensorflow.org) is developed by Google,
 CNTK (https://github.com/Microsoft/CNTK) is developed by Microsoft.

 Any piece of code that you write with Keras can be run with any of these
backends without having to change anything in the code:
 We can seamlessly switch between the two during development, which often
proves useful. For instance, if one of these backends proves to be faster for a
specific task.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Keras, TensorFlow, Theano, and CNTK (cont.)
19

 It is recommended to use TensorFlow backend as the default for most of your


deep-learning needs, because it’s the most widely adopted, scalable, and
production ready.

 Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both
CPUs and GPUs.

 When running on CPU, TensorFlow is itself wrapping a low-level library for


tensor operations called Eigen (http://eigen.tuxfamily.org).

 On GPU, TensorFlow wraps a library of well-optimized deep-learning


operations called the NVIDIA CUDA Deep Neural Network library (cuDNN).

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Developing with Keras: A Quick Overview
20

 As a refresher, here’s a two-layer model defined using the Sequential class


(note that we’re passing the expected shape of the input data to the first
layer):

 And here’s the same model defined using the functional API:

 With the functional API, you’re manipulating the data tensors that the model
processes and applying layers to this tensor as if they were functions.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Developing with Keras: A Quick Overview (cont.)
21

 Once your model architecture is defined, it doesn’t matter whether you used
a Sequential model or the functional API. All of the following steps are the
same.
 The learning process is configured in the compilation step, where you specify
the optimizer and loss function(s) that the model should use, as well as the
metrics you want to monitor during training.

 Finally, the learning process consists of passing Numpy arrays of input data
(and the corresponding target data) to the model via the fit() method.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Setting Up A Deep Learning Workstation
22

 It’s highly recommended, although not strictly necessary, that you run deep
learning code on a modern NVIDIA GPU.
 And even for applications that can realistically be run on CPU, you’ll generally
see speed increase by a factor or 5 or 10 by using a modern GPU.
 Whether you’re running locally or in the cloud, it’s better to be using a Unix
workstation.
 Although it’s technically possible to use Keras on Windows (all three Keras
backends support Windows), We don’t recommend it.
 If you’re a Windows user, the simplest solution to get everything running is to
set up an Ubuntu dual boot on your machine.
 Note that in order to use Keras, you need to install TensorFlow or CNTK or
Theano (or all of them, if you want to be able to switch back and forth among
the three backends).

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


Jupyter Notebook to Run Deep Learning Experiments
23

 Jupyter notebooks are a great way to run deep learning experiments.


 They’re widely used in the data-science and machine-learning communities.
 A notebook is a file generated by the Jupyter Notebook app
(https://jupyter.org), which you can edit in your browser.
 It mixes the ability to execute Python code with rich text-editing capabilities
for annotating what you’re doing.
 A notebook also allows you to break up long experiments into smaller pieces
that can be executed independently, which makes development interactive
and means you don’t have to rerun all of your previous code if something
goes wrong late in an experiment.
 Jupyter notebooks is recommended to get started with Keras, although that
isn’t a requirement: you can also run standalone Python scripts or run code
from within an IDE such as PyCharm.

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03


24

Thank You

Dr. Muhammad Aqib, UIIT, PMAS-AAUR, CS-408 Deep Learning, Lecture_03

You might also like