2 DeepLearning
2 DeepLearning
Learning
Samatrix Consulting Pvt Ltd
Deep Learning
Deep Learning
• Deep learning is a branch of machine learning.
• Deep learning uses artificial neural networks to understand the content of
images, natural language, and speech.
• Deep learning is a part of artificial intelligence.
• It is a subset of machine learning.
• The origins of deep learning can be attributed to Walter Pitts and Warren
McCulloch.
• In 1943, they created a computer model by taking inspiration from neural
networks present in the human brain.
• Deep learning uses artificial neural networks (ANN) to help a machine
learn.
Machine Learning vs Deep Learning
We can compare machine learning and deep learning based on six
important characteristics.
1. Quantity of data required for training
2. High accuracy while avoiding overfitting
3. Computation Power
4. Cognitive ability
5. Hardware requirement
6. Time taken
Machine Learning vs Deep Learning
• Quantity of data required for training
• The traditional machine learning algorithms do not require too much data.
Whereas the deep learning models need a larger amount of data
• High accuracy while avoiding overfitting
• Compared to deep learning algorithms, the machine learning algorithms are
relatively less accurate because machine learning algorithms require less
amount of training data to make inferences.
• On the other hand, deep learning algorithms need a large amount of data and
hence they are more accurate.
Machine Learning vs Deep Learning
• Computational Power
• Because the machine learning algorithms use a lesser amount of data, the amount of
power used by machine learning algorithms is less compared to the deep learning
algorithms.
• Deep learning algorithms require more power to analyze the data and train the
model.
• Cognitive ability
• Cognitive ability refers to the ability of the algorithm to understand the inaccuracies
and sort out the issues on its own.
• The machine learning model has a lower cognitive ability. To adjust itself to change in
training data or to improve the accuracy of the predictions, a programmer is required
to make the necessary changes and retrain the model.
• On the other hand, deep learning models have the higher cognitive ability.
• They can learn from the data and make the necessary changes on their own.
Machine Learning vs Deep Learning
• Hardware requirement
• The traditional machine learning models can be trained on low-end systems.
• On the other hand, the deep learning models require high-end sophisticated
machines equipped with GPU.
• Time taken
• Compared to the machine learning algorithms, the deep learning algorithms
need a longer time to train the models.
The Neuron
• In the previous section, we have studied that deep learning uses artificial
neural networks to solve complex problems without being explicitly
programmed.
• The neural network, or artificial neural network, has been inspired by and
modeled after the biological neural networks.
• The foundational unit of the human brain is the neural network.
• A grain-sized piece of the human brain contains over 10,000 neurons.
• Each of the neurons forms an average of 6000 connections with other
neurons.
• With the help of such a massive biological network, we can experience the
world around us.
The Neuron
• The neuron receives the information from other
neurons.
• It processes this information in a unique way and
then sends the result to other cells.
• The neuron receives the information through
dendrites.
• The strength of each incoming connection
determines the weight of the connection.
• The cell body calculates the total input from all the
connections by adding the weight of the signal for
each connection.
• This sum is transformed into a new signal that is
propagated along the cell’s axon and sent off to
other neurons.
Artificial Neuron
• This functionality of neurons in our brain can be represented using artificial
neurons.
• The artificial neurons also take some number of inputs, 𝑥1 , 𝑥2 , … , 𝑥𝑛 . Each
of the input is multiplied by specific weight, 𝑤1 , 𝑤2 , … , 𝑤𝑛 .
• We can add the weighted inputs together to produce logit of the neuron
𝑧 = σ𝑛𝑖=0 𝑤𝑖 𝑥𝑖 .
• Bias, a constant, is also part of the logit but it is not shown in figure 3.
• We pass the logit to a function 𝑓 to produce the output 𝑦 = 𝑓(𝑧).
• We transmit the output to other neurons.
• In vector form, we can re-express the output of the neuron as 𝑦 = 𝑓(x ⋅
w + 𝑏) where 𝑏 is the bias.
Artificial Neuron Architecture
• The Artificial Neuron comprises the
following architecture
1. Input layer: This layer takes inputs from
other neurons or networks
2. Summation layer: This layer aggregates the
signals it receives
3. Activation layer: This layer takes an
aggregated information and returns a value
if the aggregated input crosses a certain
threshold value otherwise it does not fire
4. Output layer: This layer might be connected
to other neurons or networks. This layer acts
as a final output layer and is used for
predictions.
Linear Perceptron
• The linear perceptron is a simple algorithm that, given an input vector 𝑥 of 𝑛
values (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), often called input features outputs either a 1 (yes) or 0
(no).
1 𝑤𝑥 + 𝑏 > 0
𝑓 𝑥 =ቊ
0 𝑤𝑥 + 𝑏 ≤ 0
• The linear perceptron is used to classify the data into two parts using a linear
hyperplane as shown in figure 4.
• Also known as linear binary classifier.
Feedforward Neural Network
• A group of artificial neural networks in which the connections between the
neurons do not form a cycle are called feedforward neural networks.
• In these neural networks, the connections between the neurons move only in a
forward direction from the input layer through hidden layers to the output layer.
• In these networks, the information flows in the forward direction only.
• Every feedforward neural network should at least have two layers: an input layer
and an output layer.
• The feedforward neural network approximates a function by using input values
that are fed from the input layer and the final output values from the output
layer.
• It compares the output values with the label values.
Shallow Feedforward Neural Network
• When a model has only input and output layer for
function approximation, it is called shallow feedforward
neural network or single-layer perceptron.
• We can directly compute the output values using the
relationship 𝑦 = 𝑓(w ∙ x + 𝑏).
• The shallow feedforward neural networks are not useful
to approximate the nonlinear function.
• There we need hidden layers between input and
output.
Deep Feedforward Neural Network
• In deep feedforward neural network or multilayer
perceptron (figure – 6), we can add one or more
hidden layers between input layer and output
layer so that we can approximate more complex
functions.
• In this architecture, every neuron is connected to
the neurons in the next layer and uses an
activation function.
• That is why they are also called fully connected
neural networks.
• The deep neural networks can approximate any
linear or non-linear function. Hence, they are
widely used to solve real-world problems.
Layers in Feedforward Neural Network
The generic neural network architecture consists of three types of
layers:
• An Input Layer
• An Output Layer
• A number of hidden layers
Input Layer
• The very first layer of the feedforward neural network is known as the
input layer.
• This layer is used to feed data into the network.
• No activation function is applied on the input layer.
• Its sole purpose is to get the data into the system.
• Ideally, the number of input layers should be equal to the number of
features.
• For example, if our model uses four input variables to predict one
response variable, we should use four neurons in the input layer.
Output Layer
• The very last layer of the feedforward neural network is known as the
output layer.
• This layer is used to output the prediction.
• Based on the nature of the problem, we decide on the number of
neurons in the output layer.
• For regression, we need to predict a single value, hence, we require
only one neuron in the output layer.
• For binary classification, we need two neurons in the output layer.
• For multi-class classification with five different classes, we need five
neurons in the output layer.
Hidden Layer
• In the feedforward neural network, the hidden layer is located
between the input and output layers.
• The hidden layers are responsible for the nonlinear transformation of
the input that has entered into the network.
TensorFlow Code for Neural Network
• Sequential API is the simplest way to create a deep neural network
model in TensorFlow 2.0.
• A Sequential() model creates a stack of neural network layers.
• The following code fragment defines a single layer that expect 784
input variables (features).
• Our neural network is dense, which means that each neuron in a layer
is connected to all the neurons located in the previous layer, and to all
the neurons in the following layer:
TensorFlow Code for Neural Network
import tensorflow as tf
from tensorflow import keras
NB_CLASSES = 10
RESHAPED = 784
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(NB_CLASSES,
input_shape=(RESHAPED,), kernel_initializer='zeros',
name='dense_layer', activation='softmax'))
TensorFlow Code for Neural Network
We can initialize each resume with specific weights using the
kernel_initializer parameter with values such as:
𝐼 𝑥 = − log 𝑃 𝑥
Information Theory
• In this case, the 𝑙𝑜𝑔 is the natural logarithm. For example, if the
probability of an event is 𝑃 𝑥 = 0.8, then 𝐼 𝑥 = 0.22.
• Alternatively, if 𝑃 𝑥 = 0.2, then 𝐼 𝑥 = 1.61.
• Hence, we can see that event information content is opposite to the
event probability.
• We can measure the amount of self-information using a natural unit
of information called nat.
• We can also use base 2 logarithm i.e., 𝐼 𝑥 = − log 2 𝑃 𝑥 . In this
case, we measure it in bits.
Information Theory
• Since there is no principal difference between the two versions, we
will use the natural logarithm version in this section.
• The example given above has been related to a single outcome.
• We can also use it for multiple outcomes by measuring the amount of
information over the probability distribution of the random variable.
• We can denote it using 𝐼(𝑋), where 𝑋 is a random discrete variable.
• The mean (or expected value) of a discrete random variable is the
weighted sum of all possible values multiplied by their probabilities.
• In this case, also, we will multiply the information content of each
event by the probability of that event.
Shannon Entropy
• We call this measure, Shannon Entropy (or just entropy). We can
define Shannon Entropy as follows:
𝑛
𝐻 𝑋 =𝐸 𝐼 𝑋 = 𝑃 𝑋 = 𝑥𝑖 log 𝑃(𝑋 = 𝑥𝑖 )
𝑖=1
• In this case 𝑥𝑖 represents the discrete variable value. The events with
higher probabilities will carry more weight compared to the events
with lower probabilities.
• Let compute the entropy using the coin toss examples
Shannon Entropy
Example 1: Let’s assume 𝑃 ℎ𝑒𝑎𝑑𝑠 = 𝑃 𝑡𝑎𝑖𝑙𝑠 = 0.5. In this case entropy is
Example 2: Let’s assume that the coins is biased and outcomes are not
equally likely. 𝑃 ℎ𝑒𝑎𝑑𝑠 = 0.2 and 𝑃 𝑡𝑎𝑖𝑙𝑠 = 0.8
𝐻 𝑃, 𝑄 = − 𝑃 𝑋 = 𝑥𝑖 log 𝑄(𝑋 = 𝑥𝑖 )
𝑖=1
Cross-Entropy
For example, let’s calculate the cross-entropy between two probability
distributions from the previous coin toss scenario.
= 𝐻 𝑃, 𝑄 − 𝐻(𝑃)
We can see that the KL divergence measure the difference between the target and the predicted log
probabilities.
Kullback-Leibler divergence (KL divergence)
The KL divergence of the coin toss example is as follows
𝐷𝐾𝐿 (𝑃| 𝑄
= 𝑃 ℎ𝑒𝑎𝑑𝑠 × [log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑄(ℎ𝑒𝑎𝑑𝑠)] − 𝑃 𝑡𝑎𝑖𝑙𝑠
× [log 𝑃 𝑡𝑎𝑖𝑙𝑠 − 𝑄(𝑡𝑎𝑖𝑙𝑠)]
= 0.5(log 0.5 − log 0.2 + 0.5(log 0.5 − log(0.8)) = 0.22
Thanks
Samatrix Consulting Pvt Ltd