0% found this document useful (0 votes)

58 views46 pages

2 DeepLearning

The document discusses neural networks and deep learning. It defines key concepts like deep learning, artificial neural networks, machine learning vs deep learning, the neuron, artificial neuron, linear perceptron, feedforward neural networks, layers in neural networks, and provides a TensorFlow code example to create a neural network model. It compares machine learning and deep learning based on factors like data requirements, accuracy, computation power, cognitive ability, hardware needs and time taken.

Uploaded by

SWAMYA RANJAN DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views46 pages

2 DeepLearning

Uploaded by

SWAMYA RANJAN DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Neural Network and Deep

Learning
Samatrix Consulting Pvt Ltd
Deep Learning
Deep Learning
• Deep learning is a branch of machine learning.
• Deep learning uses artificial neural networks to understand the content of
images, natural language, and speech.
• Deep learning is a part of artificial intelligence.
• It is a subset of machine learning.
• The origins of deep learning can be attributed to Walter Pitts and Warren
McCulloch.
• In 1943, they created a computer model by taking inspiration from neural
networks present in the human brain.
• Deep learning uses artificial neural networks (ANN) to help a machine
learn.
Machine Learning vs Deep Learning
We can compare machine learning and deep learning based on six
important characteristics.
1. Quantity of data required for training
2. High accuracy while avoiding overfitting
3. Computation Power
4. Cognitive ability
5. Hardware requirement
6. Time taken
Machine Learning vs Deep Learning
• Quantity of data required for training
• The traditional machine learning algorithms do not require too much data.
Whereas the deep learning models need a larger amount of data
• High accuracy while avoiding overfitting
• Compared to deep learning algorithms, the machine learning algorithms are
relatively less accurate because machine learning algorithms require less
amount of training data to make inferences.
• On the other hand, deep learning algorithms need a large amount of data and
hence they are more accurate.
Machine Learning vs Deep Learning
• Computational Power
• Because the machine learning algorithms use a lesser amount of data, the amount of
power used by machine learning algorithms is less compared to the deep learning
algorithms.
• Deep learning algorithms require more power to analyze the data and train the
model.
• Cognitive ability
• Cognitive ability refers to the ability of the algorithm to understand the inaccuracies
and sort out the issues on its own.
• The machine learning model has a lower cognitive ability. To adjust itself to change in
training data or to improve the accuracy of the predictions, a programmer is required
to make the necessary changes and retrain the model.
• On the other hand, deep learning models have the higher cognitive ability.
• They can learn from the data and make the necessary changes on their own.
Machine Learning vs Deep Learning
• Hardware requirement
• The traditional machine learning models can be trained on low-end systems.
• On the other hand, the deep learning models require high-end sophisticated
machines equipped with GPU.
• Time taken
• Compared to the machine learning algorithms, the deep learning algorithms
need a longer time to train the models.
The Neuron
• In the previous section, we have studied that deep learning uses artificial
neural networks to solve complex problems without being explicitly
programmed.
• The neural network, or artificial neural network, has been inspired by and
modeled after the biological neural networks.
• The foundational unit of the human brain is the neural network.
• A grain-sized piece of the human brain contains over 10,000 neurons.
• Each of the neurons forms an average of 6000 connections with other
neurons.
• With the help of such a massive biological network, we can experience the
world around us.
The Neuron
• The neuron receives the information from other
neurons.
• It processes this information in a unique way and
then sends the result to other cells.
• The neuron receives the information through
dendrites.
• The strength of each incoming connection
determines the weight of the connection.
• The cell body calculates the total input from all the
connections by adding the weight of the signal for
each connection.
• This sum is transformed into a new signal that is
propagated along the cell’s axon and sent off to
other neurons.
Artificial Neuron
• This functionality of neurons in our brain can be represented using artificial
neurons.
• The artificial neurons also take some number of inputs, 𝑥1 , 𝑥2 , … , 𝑥𝑛 . Each
of the input is multiplied by specific weight, 𝑤1 , 𝑤2 , … , 𝑤𝑛 .
• We can add the weighted inputs together to produce logit of the neuron
𝑧 = σ𝑛𝑖=0 𝑤𝑖 𝑥𝑖 .
• Bias, a constant, is also part of the logit but it is not shown in figure 3.
• We pass the logit to a function 𝑓 to produce the output 𝑦 = 𝑓(𝑧).
• We transmit the output to other neurons.
• In vector form, we can re-express the output of the neuron as 𝑦 = 𝑓(x ⋅
w + 𝑏) where 𝑏 is the bias.
Artificial Neuron Architecture
• The Artificial Neuron comprises the
following architecture
1. Input layer: This layer takes inputs from
other neurons or networks
2. Summation layer: This layer aggregates the
signals it receives
3. Activation layer: This layer takes an
aggregated information and returns a value
if the aggregated input crosses a certain
threshold value otherwise it does not fire
4. Output layer: This layer might be connected
to other neurons or networks. This layer acts
as a final output layer and is used for
predictions.
Linear Perceptron
• The linear perceptron is a simple algorithm that, given an input vector 𝑥 of 𝑛
values (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), often called input features outputs either a 1 (yes) or 0
(no).
1 𝑤𝑥 + 𝑏 > 0
𝑓 𝑥 =ቊ
0 𝑤𝑥 + 𝑏 ≤ 0
• The linear perceptron is used to classify the data into two parts using a linear
hyperplane as shown in figure 4.
• Also known as linear binary classifier.
Feedforward Neural Network
• A group of artificial neural networks in which the connections between the
neurons do not form a cycle are called feedforward neural networks.
• In these neural networks, the connections between the neurons move only in a
forward direction from the input layer through hidden layers to the output layer.
• In these networks, the information flows in the forward direction only.
• Every feedforward neural network should at least have two layers: an input layer
and an output layer.
• The feedforward neural network approximates a function by using input values
that are fed from the input layer and the final output values from the output
layer.
• It compares the output values with the label values.
Shallow Feedforward Neural Network
• When a model has only input and output layer for
function approximation, it is called shallow feedforward
neural network or single-layer perceptron.
• We can directly compute the output values using the
relationship 𝑦 = 𝑓(w ∙ x + 𝑏).
• The shallow feedforward neural networks are not useful
to approximate the nonlinear function.
• There we need hidden layers between input and
output.
Deep Feedforward Neural Network
• In deep feedforward neural network or multilayer
perceptron (figure – 6), we can add one or more
hidden layers between input layer and output
layer so that we can approximate more complex
functions.
• In this architecture, every neuron is connected to
the neurons in the next layer and uses an
activation function.
• That is why they are also called fully connected
neural networks.
• The deep neural networks can approximate any
linear or non-linear function. Hence, they are
widely used to solve real-world problems.
Layers in Feedforward Neural Network
The generic neural network architecture consists of three types of
layers:

• An Input Layer
• An Output Layer
• A number of hidden layers
Input Layer
• The very first layer of the feedforward neural network is known as the
input layer.
• This layer is used to feed data into the network.
• No activation function is applied on the input layer.
• Its sole purpose is to get the data into the system.
• Ideally, the number of input layers should be equal to the number of
features.
• For example, if our model uses four input variables to predict one
response variable, we should use four neurons in the input layer.
Output Layer
• The very last layer of the feedforward neural network is known as the
output layer.
• This layer is used to output the prediction.
• Based on the nature of the problem, we decide on the number of
neurons in the output layer.
• For regression, we need to predict a single value, hence, we require
only one neuron in the output layer.
• For binary classification, we need two neurons in the output layer.
• For multi-class classification with five different classes, we need five
neurons in the output layer.
Hidden Layer
• In the feedforward neural network, the hidden layer is located
between the input and output layers.
• The hidden layers are responsible for the nonlinear transformation of
the input that has entered into the network.
TensorFlow Code for Neural Network
• Sequential API is the simplest way to create a deep neural network
model in TensorFlow 2.0.
• A Sequential() model creates a stack of neural network layers.
• The following code fragment defines a single layer that expect 784
input variables (features).
• Our neural network is dense, which means that each neuron in a layer
is connected to all the neurons located in the previous layer, and to all
the neurons in the following layer:
TensorFlow Code for Neural Network
import tensorflow as tf
from tensorflow import keras
NB_CLASSES = 10
RESHAPED = 784
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(NB_CLASSES,
input_shape=(RESHAPED,), kernel_initializer='zeros',
name='dense_layer', activation='softmax'))
TensorFlow Code for Neural Network
We can initialize each resume with specific weights using the
kernel_initializer parameter with values such as:

random_uniform: Weights can be initialized using uniform random variables in the

range -0.05 to 0.05
random_normal: Weights can be initialized using zero mean and standard
deviation of 0.05
zero: Weights can be initialized to zero
Limitations of Linear Neurons
• The linear neurons are easier to compute but they have serious
limitations.
• We can also represent the linear neurons using a neural network
without any hidden layer.
• The real-world problems are very complex and they are far from a
linear solution.
• In order to solve real-world complex problems, we need to build
nonlinearity in our model.
• That can be achieved using hidden layers.
Sigmoid Neuron
• The sigmoid neurons use the function
1
𝑓 𝑧 = −(𝑤 𝑇 𝑥+𝑏)
1+ 𝑒
• It means that if the logit is very small, the output of the logistic neuron is
close to zero.
• If the logit is very large, the output of the logistic function will be close to
one.
• The neuron will assume an S-shape between these two extremes as shown
in figure – 7.
• In other words, we can say that the sigmoid squashes arbitrary values into
the [0, 1] interval and outputs the probability between 0 and 1.
Tanh Neuron
• Similar to sigmoid neurons, the tanh neurons also use S-shaped
nonlinearity.
• However, the output of tanh neurons ranges from -1 to 1.
• On several occasions, we prefer tanh neurons over sigmoid neurons
because the tanh neuron is zero-centered.
ReLU (Restricted Linear Unit)
• ReLU neuron uses a different kind of nonlinearity.
• It uses the function 𝑓 𝑧 = max(0, 𝑧).
• It results in a characteristic hockey-stick-shaped response.
• ReLU is one of the most popular neurons.
• It is widely used in solving computer vision problems.
• ReLU zero outs the negative values as shown in Figure – 9
SoftMax Output Layer
• On several occasions, we want that our output vector should be the
probability distribution over a set of mutually exclusive labels.
• For example, the project to recognize the handwritten digits (ten
mutually exclusive digits, 0 through 9) from the MNIST dataset using
neural networks.
• However, we would not be able to classify each digit with 100%
confidence.
• So, we will calculate the probability vector 𝑝0 , 𝑝1 , … , 𝑝9 of each digit
such that σ9𝑖=0 𝑝𝑖 = 1.
SoftMax Output Layer
• We can achieve this using a special output layer which is known as the
softmax layer.
• In the softmax layer, the output of a neuron depends on the output of all
other neurons in the softmax layer.
• We can calculate the output from a particular neuron in the softmax layer
by dividing the output from the neuron by the sum of the output from all
the neurons in the layer.
• In the case of a strong prediction, the output from one of the neurons in
the softmax layer will be close to 1 whereas the output from the rest of the
neurons in the layer will be close to 0.
• In the case of a weak prediction, multiple neurons in the softmax layer will
have more or less equal values.
Activation Functions
• In neural network jargon, Sigmoid, Tanh, ReLU, and softmax are call
activation functions.
• The activation functions are the basic building blocks of a learning
algorithm. An example of an activation function that is applied after a
linear function is illustrated in Figure – 10.
Activation Functions
• We can compare the ReLU, Tanh, and Sigmoid functions as follows
• ReLU function is a general-purpose activation function that is widely used in
neural networks. It should be used in hidden layers.
• The sigmoid function is the best for classification task
• Sigmoid and Tanh functions generally cause vanishing gradient problems.
• The best strategy would be to start with ReLU and then try other
activation functions to check if the performance improves.
Loss (Cost or Error) Function
• We use loss functions to measure the performance of a deep learning
model for given data.
• The loss function is generally based on error terms.
• The error terms can be calculated by finding out the distance
between the real (measured) value and the predicted value of the
trained model.
𝐸𝑟𝑟𝑜𝑟 = 𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
• The function is referred to as loss function, cost function, or error
function.
Loss (Cost or Error) Function
• In deep learning, we use several loss functions to evaluate the
performance.
• We should choose the right error function to find the optimum
solution for our problem.
• However, the selection of the loss function should depend upon the
nature of the problem.
• For example, for regression problems, the root mean squared error
(RMSE) function is the right loss function.
• For multi-class classification problems, we should use multi-class
crossentropy.
Loss (Cost or Error) Function
• For deep learning tasks, several loss functions are available.
• Root mean squared error (root_mean_squared_error), mean squared
error (mean_squared_error), mean absolute error (mean_absolute_error),
mean absolute percentage error (mean_absolute_percentage_error) are
appropriate loss functions for regression.
• For two-class classification problems, you should use binary
crossentropy (binary_crossentropy).
• For a many-class classification problems, you should use categorical
crossentropy (categorical_crossentropy).
Loss (Cost or Error) Function
• Crossentropy is a quantity from the field of Information Theory.
• It measures the distance between probability distributions.
• In this case, it measures the difference between true distribution and
our predictions.
Information Theory
Information Theory
We try to determine the amount of information an event has using Information
Theory. The principles of information theory are as follows

• If the probability of an event is high, the information is considered less

informative. On the other hand, if the probability of lower, the information is
considered high informative.
• The information from independent events can be calculated by adding their
individual information content.

The amount of information of an event 𝑥 is defined as follows:

𝐼 𝑥 = − log 𝑃 𝑥
Information Theory
• In this case, the 𝑙𝑜𝑔 is the natural logarithm. For example, if the
probability of an event is 𝑃 𝑥 = 0.8, then 𝐼 𝑥 = 0.22.
• Alternatively, if 𝑃 𝑥 = 0.2, then 𝐼 𝑥 = 1.61.
• Hence, we can see that event information content is opposite to the
event probability.
• We can measure the amount of self-information using a natural unit
of information called nat.
• We can also use base 2 logarithm i.e., 𝐼 𝑥 = − log 2 𝑃 𝑥 . In this
case, we measure it in bits.
Information Theory
• Since there is no principal difference between the two versions, we
will use the natural logarithm version in this section.
• The example given above has been related to a single outcome.
• We can also use it for multiple outcomes by measuring the amount of
information over the probability distribution of the random variable.
• We can denote it using 𝐼(𝑋), where 𝑋 is a random discrete variable.
• The mean (or expected value) of a discrete random variable is the
weighted sum of all possible values multiplied by their probabilities.
• In this case, also, we will multiply the information content of each
event by the probability of that event.
Shannon Entropy
• We call this measure, Shannon Entropy (or just entropy). We can
define Shannon Entropy as follows:
𝑛

𝐻 𝑋 =𝐸 𝐼 𝑋 = ෍ 𝑃 𝑋 = 𝑥𝑖 log 𝑃(𝑋 = 𝑥𝑖 )
𝑖=1
• In this case 𝑥𝑖 represents the discrete variable value. The events with
higher probabilities will carry more weight compared to the events
with lower probabilities.
• Let compute the entropy using the coin toss examples
Shannon Entropy
Example 1: Let’s assume 𝑃 ℎ𝑒𝑎𝑑𝑠 = 𝑃 𝑡𝑎𝑖𝑙𝑠 = 0.5. In this case entropy is

𝐻 𝑋 = −𝑃 ℎ𝑒𝑎𝑑𝑠 log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 log 𝑃 𝑡𝑎𝑖𝑙𝑠

= −0.5 ∗ −0.69 − 0.5 ∗ −0.69 = 0.7

Example 2: Let’s assume that the coins is biased and outcomes are not
equally likely. 𝑃 ℎ𝑒𝑎𝑑𝑠 = 0.2 and 𝑃 𝑡𝑎𝑖𝑙𝑠 = 0.8

𝐻 𝑋 = −𝑃 ℎ𝑒𝑎𝑑𝑠 log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 log 𝑃 𝑡𝑎𝑖𝑙𝑠

= −0.2 ∗ −1.62 − 0.8 ∗ −0.22 = 0.5
Shannon Entropy
• Hence, we can say that the entropy is
highest when the outcomes are
equally likely and decreases when one
outcome becomes prevalent.
• So, we can use entropy as a
measurement of uncertainty or chaos.
• In the following diagram, we have
shown the graph of the entropy 𝐻(𝑋)
over a binary event (such as the coin
toss), depending on the probability
distribution of the two outcomes.
Cross-Entropy
• Now, let’s assume that we have a discrete random variable, 𝑋, and
two different probability distributions over it.
• In deep learning, this is a usual scenario. For example, the neural
network produces some probability distribution 𝑄(𝑋) and we want to
compare it with a target distribution 𝑃(𝑋) during training.
• Using cross-entropy, we can measure the difference between these
two distributions. The cross-entropy can be defined as follows:
𝑛

𝐻 𝑃, 𝑄 = − ෍ 𝑃 𝑋 = 𝑥𝑖 log 𝑄(𝑋 = 𝑥𝑖 )
𝑖=1
Cross-Entropy
For example, let’s calculate the cross-entropy between two probability
distributions from the previous coin toss scenario.

Predicted Distribution: 𝑄 ℎ𝑒𝑎𝑑𝑠 = 0.2, 𝑄 𝑡𝑎𝑖𝑙𝑠 = 0.8

Target (true) Distribution: 𝑃 ℎ𝑒𝑎𝑑𝑠 = 𝑃 𝑡𝑎𝑖𝑙𝑠 = 0.5

The cross entropy can be calculated as follows

𝐻 𝑃, 𝑄 = −𝑃 ℎ𝑒𝑎𝑑𝑠 × log 𝑄 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 × log 𝑄(𝑡𝑎𝑖𝑙𝑠)

= −0.5 × −1.61 − 0.5 × −0.22 = 0.915

Kullback-Leibler divergence (KL divergence)
KL Divergence is another measure of the difference between two probability distribution.
𝑛
𝑃 𝑋 = 𝑥𝑖
𝐷𝐾𝐿 (𝑃| 𝑄 = ෍ 𝑃 𝑋 = 𝑥𝑖 log
𝑄 𝑋 = 𝑥𝑖
𝑛 𝑖=1

= ෍ 𝑃 𝑋 = 𝑥𝑖 [log 𝑃 𝑋 = 𝑥𝑖 − log 𝑄(𝑋 = 𝑥𝑖 )]

𝑛 𝑖=1

= ෍[𝑃 𝑋 = 𝑥𝑖 log 𝑃 𝑋 = 𝑥𝑖 − 𝑃(𝑋 = 𝑥𝑖 ) log 𝑄(𝑋 = 𝑥𝑖 )]

𝑖=1

= 𝐻 𝑃, 𝑄 − 𝐻(𝑃)

We can see that the KL divergence measure the difference between the target and the predicted log
probabilities.
Kullback-Leibler divergence (KL divergence)
The KL divergence of the coin toss example is as follows

𝐷𝐾𝐿 (𝑃| 𝑄
= 𝑃 ℎ𝑒𝑎𝑑𝑠 × [log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑄(ℎ𝑒𝑎𝑑𝑠)] − 𝑃 𝑡𝑎𝑖𝑙𝑠
× [log 𝑃 𝑡𝑎𝑖𝑙𝑠 − 𝑄(𝑡𝑎𝑖𝑙𝑠)]
= 0.5(log 0.5 − log 0.2 + 0.5(log 0.5 − log(0.8)) = 0.22
Thanks
Samatrix Consulting Pvt Ltd

DL Unit 1
No ratings yet
DL Unit 1
200 pages
IES - Electrical Engineering - Control System
50% (2)
IES - Electrical Engineering - Control System
101 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Unit 4
100% (1)
Unit 4
57 pages
EE3011 Modelling and Control - OBTL
0% (1)
EE3011 Modelling and Control - OBTL
4 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Neural Networks
No ratings yet
Neural Networks
44 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
An Ingression Into Deep Learning - Resp
No ratings yet
An Ingression Into Deep Learning - Resp
25 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Deep Learning c1
No ratings yet
Deep Learning c1
86 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
Applications of Big Data Analytics in E-Commerce
No ratings yet
Applications of Big Data Analytics in E-Commerce
11 pages
Introduction To Neural Networks: Training Learn Generalization
No ratings yet
Introduction To Neural Networks: Training Learn Generalization
46 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
14 pages
Theoretical Framework
No ratings yet
Theoretical Framework
7 pages
Data and Information Governance Maturity Questionnaire
No ratings yet
Data and Information Governance Maturity Questionnaire
3 pages
A2uscpu (S1), Melsec-A PLC User Manual
No ratings yet
A2uscpu (S1), Melsec-A PLC User Manual
194 pages
Lesson 1:: Unit I. The Nature of Language and The Language Learners
No ratings yet
Lesson 1:: Unit I. The Nature of Language and The Language Learners
2 pages
Intro To DL - Module - 1 2
No ratings yet
Intro To DL - Module - 1 2
115 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Artifical Neural Network
No ratings yet
Artifical Neural Network
69 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Unit 4 Hca
No ratings yet
Unit 4 Hca
57 pages
Module 1
No ratings yet
Module 1
100 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
47 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Modern Machine Learning in Python
No ratings yet
Modern Machine Learning in Python
50 pages
Modern Concepts in Artificial Intelligence, Second Edition
0% (1)
Modern Concepts in Artificial Intelligence, Second Edition
23 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Unit 4-Health Care and Deep Learninh
No ratings yet
Unit 4-Health Care and Deep Learninh
87 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
61 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Unit IV Artificial Neural Networks
No ratings yet
Unit IV Artificial Neural Networks
25 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Module 2
No ratings yet
Module 2
84 pages
DBMS-Lab Programs
No ratings yet
DBMS-Lab Programs
47 pages
Evaluation Metrics: Anand Avati
No ratings yet
Evaluation Metrics: Anand Avati
31 pages
ML & AI Notes
No ratings yet
ML & AI Notes
81 pages
Unit 1
No ratings yet
Unit 1
70 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
ANN-unit 1
No ratings yet
ANN-unit 1
59 pages
Chapter 18
No ratings yet
Chapter 18
31 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Generative AI Certification Course
No ratings yet
Generative AI Certification Course
8 pages
Language Development in Hearing Children
No ratings yet
Language Development in Hearing Children
18 pages
Unit-3 D.L
No ratings yet
Unit-3 D.L
16 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
Week 2
No ratings yet
Week 2
47 pages
Unit V
No ratings yet
Unit V
49 pages
Lesson 03 Artificial Neural Network
No ratings yet
Lesson 03 Artificial Neural Network
116 pages
Business Communications - 1 PDF
No ratings yet
Business Communications - 1 PDF
12 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Neural Networks
No ratings yet
Neural Networks
16 pages
On Calibration of Modern Neural Networks
No ratings yet
On Calibration of Modern Neural Networks
14 pages
A Deep Learning Based Trust
No ratings yet
A Deep Learning Based Trust
22 pages
Unit 4 - DL
No ratings yet
Unit 4 - DL
33 pages
Neural Network Oxygen
No ratings yet
Neural Network Oxygen
25 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
C1 W2
No ratings yet
C1 W2
18 pages
Chap 1
No ratings yet
Chap 1
20 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Unit-1 and 2 Deep Learning
No ratings yet
Unit-1 and 2 Deep Learning
22 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Unit 3
No ratings yet
Unit 3
8 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
10 pages
Neural Network
No ratings yet
Neural Network
7 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
Experiment 6 AISC
No ratings yet
Experiment 6 AISC
7 pages
ML Unit 4
No ratings yet
ML Unit 4
16 pages
5 Neural Networks 30-07-2024
No ratings yet
5 Neural Networks 30-07-2024
32 pages
Coding Conclusion Questions
No ratings yet
Coding Conclusion Questions
4 pages
Backpropagation in Convolutional Neural Networks
No ratings yet
Backpropagation in Convolutional Neural Networks
4 pages
Speech Act and Speech Event-1 Adlane
No ratings yet
Speech Act and Speech Event-1 Adlane
4 pages
Basics of ML W Solution - Pages
No ratings yet
Basics of ML W Solution - Pages
3 pages
Feb 2023 Assignment
No ratings yet
Feb 2023 Assignment
3 pages
Periodic Compensation of Continuous-Time Plants
No ratings yet
Periodic Compensation of Continuous-Time Plants
7 pages
Ee2365 LP
No ratings yet
Ee2365 LP
3 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)

2 DeepLearning

Uploaded by

2 DeepLearning

Uploaded by

Neural Network and Deep

random_uniform: Weights can be initialized using uniform random variables in the

• If the probability of an event is high, the information is considered less

The amount of information of an event 𝑥 is defined as follows:

𝐻 𝑋 = −𝑃 ℎ𝑒𝑎𝑑𝑠 log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 log 𝑃 𝑡𝑎𝑖𝑙𝑠

𝐻 𝑋 = −𝑃 ℎ𝑒𝑎𝑑𝑠 log 𝑃 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 log 𝑃 𝑡𝑎𝑖𝑙𝑠

Predicted Distribution: 𝑄 ℎ𝑒𝑎𝑑𝑠 = 0.2, 𝑄 𝑡𝑎𝑖𝑙𝑠 = 0.8

The cross entropy can be calculated as follows

𝐻 𝑃, 𝑄 = −𝑃 ℎ𝑒𝑎𝑑𝑠 × log 𝑄 ℎ𝑒𝑎𝑑𝑠 − 𝑃 𝑡𝑎𝑖𝑙𝑠 × log 𝑄(𝑡𝑎𝑖𝑙𝑠)

= −0.5 × −1.61 − 0.5 × −0.22 = 0.915

= ෍ 𝑃 𝑋 = 𝑥𝑖 [log 𝑃 𝑋 = 𝑥𝑖 − log 𝑄(𝑋 = 𝑥𝑖 )]

= ෍[𝑃 𝑋 = 𝑥𝑖 log 𝑃 𝑋 = 𝑥𝑖 − 𝑃(𝑋 = 𝑥𝑖 ) log 𝑄(𝑋 = 𝑥𝑖 )]

You might also like