0% found this document useful (0 votes)

24 views73 pages

1.1 Introduction

deep learning

Uploaded by

jadhavrohan7337

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views73 pages

1.1 Introduction

deep learning

Uploaded by

jadhavrohan7337

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

Introduction

Revisiting Basics
The Neural Network
robotic assistants to clean our
Building Intelligent homes
cars that drive themselves
Machines microscopes that automatically
detect diseases.

Requires us to solve some of the most

complex computational problems

Limits of Traditional Computer

Programs?
Recognize?

• object recognition
• speech comprehension
• automated translation
Machine Learning Mechanics
Deep learning is a subset of a more general field of artificial
intelligence called machine learning

In machine learning, instead of teaching a computer a massive

list of rules to solve the problem, we give it a model with which
it can evaluate examples, and a small set of instructions to
modify the model when it makes a mistake

We expect that, over time, a well-suited model would be able to

solve the problem extremely accurately
Machine Learning Mechanics
• Let’s define our model to be a function h(x, θ) .
• The input x is an example expressed in vector form
▫ For example, if x were a grayscale image, the vector’s
components would be pixel intensities at each position
• The input θ is a vector of the parameters that our model
uses.
• Our machine learning program tries to perfect the values
of these parameters as it is exposed to more and more
examples
Example- TPS Activity
• Predict exam performance based on the
number of hours of sleep we get and the
number of hours we study the previous
day.
Solution
• collect data
• for each data point x=[ x1, x2]T,
• Record the number of hours of sleep we got (x1),
the number of hours we spent studying (x2)
• Goal- learn a model h(x, θ) with parameter vector
θ = [θ0 θ1 θ2]T such that:
• Then it turns
out, by selecting
θ = [−24 3 4]T,
our machine
learning model
makes the
correct
prediction on
every data point
Solution
• An optimal parameter vector θ positions
the classifier to make as many correct
Optimizatio
n
predictions as possible
• How do we even come up with an optimal
value for the parameter vector θ in the
first place?
• An optimizer aims to maximize the
performance of a machine learning model
by iteratively tweaking its parameters
until the error is minimized
Example

y= ƒ(WTx +
b)
Limitations of Linear
Perceptron

• But these situations are only the tip of the iceberg.

• More complex problems, such as object recognition
and text analysis
• Data becomes extremely high dimensional, and the
relationships we want to capture become highly
nonlinear
Deep
• So? Learning
The Neuron

• The neuron receives its inputs along antennae-

like structures called dendrites.
• Each of these incoming connections is
dynamically strengthened or weakened based on
how often it is used
• Inputs are summed together in the cell body
The Neuron
Macculloch-Pits

y= ƒ(WTx +
b)
Feed Forward Neural Networks
Sigmoid, Tanh, and ReLU
Neurons
• f( z ) = 1/1 + e−z
• Intuitively, this means that when the logit (z) is
very small, the output of a logistic neuron is very
close to 0.
• When the logit is very large, the output of the
logistic neuron is close to 1.
Sample Code
mmatrix= np.array([[1,2,3],[4,5,6]])
print(mmatrix)

def sigmoid(X):
return 1/(1+np.exp(-X))

sigmoid(mmatrix)

• output:
array([[0.73105858, 0.88079708, 0.95257413],
[0.98201379, 0.99330715, 0.99752738]])
Tanh Neuron
• Tanh neurons use a similar kind of S-
shaped nonlinearity
• The output of tanh neurons range from
−1 to 1

f(x)=(2/1+e-2x )-
1
Comparison Sigmoid & Tanh
ReLU Neuron
• f (z) = max( 0, z ) def relu(X):
return
np.maximum(0,X)
relu(mmatrix)
ReLU
• The main advantages of the ReLU activation function are as
follows:
• Sparsity:
▫ ReLU can introduce sparsity in the network by setting
negative values to zero. This means that only a subset of the
neurons is activated, which can lead to more efficient
computation and memory usage.
• Simplicity:
▫ ReLU is a simple and computationally efficient activation
function, as it involves only a single non-linear operation.
• Avoiding the vanishing gradient problem:
▫ Unlike activation functions such as sigmoid or tanh, ReLU
does not saturate for positive inputs.
▫ This property helps mitigate the vanishing gradient
problem, which can occur when the gradients become very
Softmax Output Layers
• Want your output vector to be a probability
distribution over a set of mutually exclusive
labels.
• Probability distribution gives us a better idea of
confidence in predictions
Σi =0 pi = 1
[p0 p1 p2 p3 . . . p9]

• This is achieved by using a special output layer

called a softmax layer
Softmax Output Layers
• Output of a neuron in a softmax layer
depends on the outputs of all the other
neurons in its layer
• require the sum of all the outputs to be
equal to 1
• Letting zi be the logit of the ith softmax
neuron, we can achieve this normalization
by setting its output to
yi = e /Σje
zi zj
Pseudo Code
import numpy as np

def softmax(x):
""" applies softmax to an input
x"""
e_x = np.exp(x)
return e_x / e_x.sum()

x = np.array([1, 0, 3, 5])
y = softmax(x)

[0.01578405 0.00580663 0.11662925 0.86178007]

The Fast Food Problem
• Every single day, we purchase a
restaurant meal consisting of burgers,
fries, and sodas. We buy some number of
servings for each item. We want to be
able to predict how much a meal is going
to cost us, but the items don’t have price
tags. The only thing the cashier will tell
us is the total price of the meal. We want
to train a single linear neuron to solve
this problem. How do we do it?
The Fast Food Problem
Possible Solution
• Be intelligent about picking our training
cases.
• E.g. For one meal we could buy only a single
serving of burgers, for another only buy a
single serving of fries, and then for last meal
buy a single serving of soda.
• In general, intelligently selecting training
examples is a very good idea
• Engineering a clever training set, you can
make your neural network a lot more effective.
More General Approach
• Assume large set of training examples
• calculate what the neural network will
output on the ith training example using
the simple formula
• train the neuron for optimal weights
E = ½ Σi [t(i) − y(i )]2

• What if E=0?
Gradient Descent
• Let’s say our linear neuron only has two inputs(weights,
w1 and w2).
• Imagine a three-dimensional space where the horizontal
dimensions correspond to the weights w1 and w2, and the
vertical dimension corresponds to the value of the error
function E.
Gradient Descent
Gradient Descent
• Visualize surface as a set of elliptical contours
• The minimum error is at the center of the
ellipses
• Contours correspond to settings of w1 and w2
that evaluate to the same value of E
• The closer the contours are to each other, the
steeper the slope
• The direction of the steepest descent is always
perpendicular to the contours. This direction is
expressed as a vector known as the gradient
The Delta Rule and Learning
Rates
• In practice, at each step of moving perpendicular to
the contour, we need to determine how far we want
to walk before recalculating our new direction.
• This distance needs to depend on the steepness of
the surface. Why?
• The closer we are to the minimum, the shorter we
want to step forward
• We know we are close to the minimum, because the
surface is a lot flatter, so we can use the steepness as
an indicator of how close we are to the minimum
• Learning rate, ε
Example GD
• Let’s take a simple quadratic function
defined as:

• Because it is an univariate function a

gradient function is:

• learning rate of 0.1 and starting point at

x=9 we can easily calculate each step by
hand. Let’s do it for the first 3 steps:
• GD algorithm for learning rates of 0.1
and 0.8
Vanishing Gradient Problem
• Activation functions, like the sigmoid function,
squishes a large input space into a small input
space between 0 and 1
• Large change in the input of the sigmoid
function will cause a small change in the output
Vanishing Gradient Problem

• The maximum point of the function is 1/4, and

the function horizontally asymptotes at 0.
• In other words, the output of the derivative of
the cost function is always between 0 and 1/4.
• Mathematically, it ranges between (0, 1/4]
Vanishing Gradient Problem

• By multiplying these two derivatives together, we are

multiplying two values in the range (0, 1/4].
• Any two numbers between 0 and 1 multiplied with each
other will simply result in a smaller value. For example,
1/3 × 1/3 is 1/9
Gradient Descent with Sigmoidal
Neurons
Gradient Descent with Sigmoidal
Neurons
Gradient Descent with Sigmoidal
Neurons
Gradient Descent with Sigmoidal
Neurons
Backpropagation Algorithm
We get a certain loss at the output
and we try to figure out who is
responsible for this loss
Backpropagation Algorithm
We get a certain loss at the output
and we try to figure out who is
responsible for this loss
So, we talk to the output layer and
say Hey! You are not producing the
desired output, better take
responsibility".
Backpropagation Algorithm
We get a certain loss at the output
and we try to figure out who is
responsible for this loss
So, we talk to the output layer and
say Hey! You are not producing the
desired output, better take
responsibility".
The output layer says Well, “I take
responsibility for my part but please
understand that I am only as the
good as the hidden layer and
weights below me". After all :
51
Stochastic Gradient Descent
• Stochastic gradient descent is an extension of
the gradient descent
• A recurring problem in machine learning is that
large training sets are necessary for good
generalization
• Large training sets are also more
computationally expensive.
• The negative conditional log-likelihood of the
training data can be written as
Stochastic Gradient Descent
• For additive cost functions, gradient
descent requires computing

• The computational cost of this operation

is O(m).
• As the training set size grows to billions
of examples, the time to take a single
gradient step becomes prohibitively long.
Stochastic Gradient Descent
Stochastic Gradient Descent
• Specifically, on each step of the algorithm, we
can sample a minibatch of examples B =
{x(1), . . . , x(m’)} drawn uniformly from the
training set.
• The minibatch size m’ is typically chosen to be a
relatively small number of examples.
• The estimate of the gradient is formed as

• using examples from the minibatch B. The

stochastic gradient descent algorithm then
follows the estimated gradient downhill
Mini Batch
SGD VS Mini Batch Batch Size=
10
Capacity, Overfitting and Underfitting
• The central challenge in machine learning is
that we must perform well on new, previously
unseen inputs.
• The ability to perform well on previously
unobserved inputs is called generalization
• The factors determining how well a machine
learning algorithm will perform are its ability
to:
▫ Make the training error small.
▫ Make the gap between training and test error
small.
Capacity, Overfitting and Underfitting
• Underfitting occurs when the model is not able to
obtain a sufficiently low error value on the training set.
• Overfitting occurs when the gap between the training
error and test error is too large.
• We can control whether a model is more likely to
overfit or underfit by altering its capacity.
• Model’s capacity is its ability to fit a wide variety of
functions
• Models with low capacity may struggle to fit the
training set.
• Models with high capacity can overfit by memorizing
properties of the training set
Capacity, Overfitting and Underfitting
Preventing Overfitting in Deep Neural
Networks
• Regularization modifies the objective function
that we minimize by adding additional terms
that penalize large weights.
• change the objective function so that it
becomes Error + λ f (θ)
▫ where, f (θ) grows larger as the components of θ
grow larger, and λ is the regularization strength
• The most common type of regularization in
machine learning is L2 regularization
• L2 regularization is also commonly referred to
as weight decay
Preventing Overfitting in Deep Neural
Networks
• Another common type of regularization is L1
regularization.
• We add the term λ|w| for every weight w in the
neural network.
• The L1 regularization has the property that leads
the weight vectors to become sparse during
optimization (i.e., very close to exactly zero)
• Neurons with L1 regularization end up using only
a small subset of their most important inputs
• L1 regularization is very useful when you want to
understand exactly which features are
contributing to a decision
Preventing Overfitting in Deep Neural
Networks
• Another approach –Dropout
• While training, dropout is implemented by only
keeping a neuron active with some probability p
(a hyperparameter), or setting it to zero
otherwise
Challenges Motivating Deep Learning
• The simple machine learning algorithms work
very well on a wide variety of important
problems.
• However, they have not succeeded in solving
the central problems in AI, such as recognizing
speech or recognizing objects.
• The development of deep learning was
motivated in part by the failure of traditional
algorithms to generalize well on such AI tasks
• High-dimensional spaces. Imposes high
computational costs
Challenges Motivating Deep Learning
• The Curse of Dimensionality
▫ Many machine learning problems become
exceedingly difficult when the number of
dimensions in the data is high.
▫ This phenomenon is known as the curse of
dimensionality
Challenges Motivating Deep Learning
• Local Constancy
▫ Local constancy is a concept related to the assumption
that data samples that are close to each other in the input
space should have similar output predictions.
▫ In other words, if two data points are similar in their
features or attributes, the model's output for these points
should also be similar
• Smoothness Regularization
▫ Smoothness regularization is a technique used in
machine learning models, including deep learning
models, to encourage smooth transitions in the
predictions across the input space.
▫ The objective of smoothness regularization is to penalize
models for producing sharp, erratic, or noisy predictions,
which can lead to overfitting and poor generalization on
Challenges Motivating Deep Learning
• Manifold Learning
▫ Manifold learning in deep learning refers to the
use of deep neural networks to learn low-
dimensional representations (manifolds) of high-
dimensional data.
▫ The key idea behind manifold learning is that the
data often lies on a lower-dimensional manifold
within the high-dimensional input space.
▫ By discovering this underlying manifold, deep
learning models can extract meaningful and
compact representations that capture the essential
structure of the data.
Tensorflow 2.0
• TensorFlow is widely used as a machine learning
implementation library.
• It was created by Google as part of the Google
Brain project
• Later made available as an open source product
Tensorflow 2.0
• Tensors are the building blocks of
TensorFlow, as all computations are done
using tensors
• A tensor is a generalization of vectors and
matrices to potentially higher dimensions.
Internally, TensorFlow represents tensors
as n-dimensional arrays of base
datatypes.
Tensors can be of two types:
constant or variable
Tensorflow 2.0
Example

1. TensorFlow 2. TensorFlow
3. TensorFlow
2.0 doesn’t 2.0 doesn’t
2.0 doesn’t
require the require the
make it
graph session
mandatory to
definition. execution.
4. TensorFlow
2.0 doesn’t
initialize
require variable
variables.
sharing via
scopes.
Example
g = tf.Graph() a = tf.constant([[10,10],
with g.as_default(): [11.,1.]])
a = tf.constant([[10,10],[11.,1.]])x = tf.constant([[1.,0.],[0.,1.]])
x = tf.constant([[1.,0.],[0.,1.]]) b = tf.Variable(12.)
b = tf.Variable(12.) y = tf.matmul(a, x) + b
y = tf.matmul(a, x) + b
print(y.numpy())
init_op =
tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(y))
Summary
• Machine Learning Basics
• Neuron
• Feed forward Network
• Gradient Descent
• Backpropagation Algorithm
• Challenges
• Tensorflow 2.0

Nutripot BM33100 6qt Manual
No ratings yet
Nutripot BM33100 6qt Manual
32 pages
Romer 5e Solutions Manual 06
100% (1)
Romer 5e Solutions Manual 06
26 pages
Monthly Statement: Name Address Account Number Statement Period
No ratings yet
Monthly Statement: Name Address Account Number Statement Period
8 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Chapter 2 - 2 Shallow Neural Network 2 - 2
No ratings yet
Chapter 2 - 2 Shallow Neural Network 2 - 2
34 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Lec 8
No ratings yet
Lec 8
43 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Cours 1
No ratings yet
Cours 1
42 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Unit 1
No ratings yet
Unit 1
29 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Cours1 Annotations
No ratings yet
Cours1 Annotations
42 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
50 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
855597620
No ratings yet
855597620
44 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
SHAI - Task 3 - NN
No ratings yet
SHAI - Task 3 - NN
10 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Module 2
No ratings yet
Module 2
44 pages
cs188 Fa24 Lec24
No ratings yet
cs188 Fa24 Lec24
46 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
No ratings yet
Towards A Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't
56 pages
Lec03 NeuralNetwork
No ratings yet
Lec03 NeuralNetwork
39 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Unit 1
No ratings yet
Unit 1
72 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Mip Noc
No ratings yet
Mip Noc
2 pages
Mock Drill Report
No ratings yet
Mock Drill Report
2 pages
Importance of Industrial Relations
100% (1)
Importance of Industrial Relations
2 pages
Dissolution Problems
No ratings yet
Dissolution Problems
12 pages
Reviewer: Industrial Organizational Psychology
100% (3)
Reviewer: Industrial Organizational Psychology
35 pages
Resume Harsh Choudhary
No ratings yet
Resume Harsh Choudhary
1 page
Hw6 Solution
No ratings yet
Hw6 Solution
11 pages
MC 2022-2305
No ratings yet
MC 2022-2305
3 pages
Check
No ratings yet
Check
152 pages
DLP Tle 6
No ratings yet
DLP Tle 6
2 pages
Prefabricated Housing in Japan
No ratings yet
Prefabricated Housing in Japan
25 pages
ACTIVITY 1.2 - Match Up Challenge
No ratings yet
ACTIVITY 1.2 - Match Up Challenge
2 pages
GC1 PPT Questions
No ratings yet
GC1 PPT Questions
4 pages
Day 17 NAT and PAT
No ratings yet
Day 17 NAT and PAT
18 pages
Economy of Eswatini
No ratings yet
Economy of Eswatini
7 pages
Excercise Process Analysis
No ratings yet
Excercise Process Analysis
8 pages
Computing Key Stage 3 Lesson COMy9u5L1
No ratings yet
Computing Key Stage 3 Lesson COMy9u5L1
20 pages
2505744634 (2)
No ratings yet
2505744634 (2)
1 page
21EC43 2024 July
No ratings yet
21EC43 2024 July
3 pages
Mechanical Measurement and Control - Question Bank
100% (1)
Mechanical Measurement and Control - Question Bank
3 pages
Entrepreneurship
No ratings yet
Entrepreneurship
160 pages
Database Recovery Techniques
No ratings yet
Database Recovery Techniques
37 pages
பைந்தமிழ் 12 Final
No ratings yet
பைந்தமிழ் 12 Final
68 pages
Ecostruxure Control Expert With Topology Manager
100% (1)
Ecostruxure Control Expert With Topology Manager
11 pages
Loans Webquest
No ratings yet
Loans Webquest
3 pages
Trubend 5000 Series Retrofitting Guide
No ratings yet
Trubend 5000 Series Retrofitting Guide
18 pages
PWD Rej.
No ratings yet
PWD Rej.
52 pages

1.1 Introduction

Uploaded by

1.1 Introduction

Uploaded by

Introduction

Requires us to solve some of the most

Limits of Traditional Computer

In machine learning, instead of teaching a computer a massive

We expect that, over time, a well-suited model would be able to

• But these situations are only the tip of the iceberg.

• The neuron receives its inputs along antennae-

• This is achieved by using a special output layer

[0.01578405 0.00580663 0.11662925 0.86178007]

• Because it is an univariate function a

• learning rate of 0.1 and starting point at

• The maximum point of the function is 1/4, and

• By multiplying these two derivatives together, we are

• The computational cost of this operation

• using examples from the minibatch B. The

You might also like