1.1 Introduction
1.1 Introduction
Revisiting Basics
The Neural Network
robotic assistants to clean our
Building Intelligent homes
cars that drive themselves
Machines microscopes that automatically
detect diseases.
• object recognition
• speech comprehension
• automated translation
Machine Learning Mechanics
Deep learning is a subset of a more general field of artificial
intelligence called machine learning
y= ƒ(WTx +
b)
Limitations of Linear
Perceptron
y= ƒ(WTx +
b)
Feed Forward Neural Networks
Sigmoid, Tanh, and ReLU
Neurons
• f( z ) = 1/1 + e−z
• Intuitively, this means that when the logit (z) is
very small, the output of a logistic neuron is very
close to 0.
• When the logit is very large, the output of the
logistic neuron is close to 1.
Sample Code
mmatrix= np.array([[1,2,3],[4,5,6]])
print(mmatrix)
def sigmoid(X):
return 1/(1+np.exp(-X))
sigmoid(mmatrix)
• output:
array([[0.73105858, 0.88079708, 0.95257413],
[0.98201379, 0.99330715, 0.99752738]])
Tanh Neuron
• Tanh neurons use a similar kind of S-
shaped nonlinearity
• The output of tanh neurons range from
−1 to 1
f(x)=(2/1+e-2x )-
1
Comparison Sigmoid & Tanh
ReLU Neuron
• f (z) = max( 0, z ) def relu(X):
return
np.maximum(0,X)
relu(mmatrix)
ReLU
• The main advantages of the ReLU activation function are as
follows:
• Sparsity:
▫ ReLU can introduce sparsity in the network by setting
negative values to zero. This means that only a subset of the
neurons is activated, which can lead to more efficient
computation and memory usage.
• Simplicity:
▫ ReLU is a simple and computationally efficient activation
function, as it involves only a single non-linear operation.
• Avoiding the vanishing gradient problem:
▫ Unlike activation functions such as sigmoid or tanh, ReLU
does not saturate for positive inputs.
▫ This property helps mitigate the vanishing gradient
problem, which can occur when the gradients become very
Softmax Output Layers
• Want your output vector to be a probability
distribution over a set of mutually exclusive
labels.
• Probability distribution gives us a better idea of
confidence in predictions
Σi =0 pi = 1
[p0 p1 p2 p3 . . . p9]
def softmax(x):
""" applies softmax to an input
x"""
e_x = np.exp(x)
return e_x / e_x.sum()
x = np.array([1, 0, 3, 5])
y = softmax(x)
• What if E=0?
Gradient Descent
• Let’s say our linear neuron only has two inputs(weights,
w1 and w2).
• Imagine a three-dimensional space where the horizontal
dimensions correspond to the weights w1 and w2, and the
vertical dimension corresponds to the value of the error
function E.
Gradient Descent
Gradient Descent
• Visualize surface as a set of elliptical contours
• The minimum error is at the center of the
ellipses
• Contours correspond to settings of w1 and w2
that evaluate to the same value of E
• The closer the contours are to each other, the
steeper the slope
• The direction of the steepest descent is always
perpendicular to the contours. This direction is
expressed as a vector known as the gradient
The Delta Rule and Learning
Rates
• In practice, at each step of moving perpendicular to
the contour, we need to determine how far we want
to walk before recalculating our new direction.
• This distance needs to depend on the steepness of
the surface. Why?
• The closer we are to the minimum, the shorter we
want to step forward
• We know we are close to the minimum, because the
surface is a lot flatter, so we can use the steepness as
an indicator of how close we are to the minimum
• Learning rate, ε
Example GD
• Let’s take a simple quadratic function
defined as:
1. TensorFlow 2. TensorFlow
3. TensorFlow
2.0 doesn’t 2.0 doesn’t
2.0 doesn’t
require the require the
make it
graph session
mandatory to
definition. execution.
4. TensorFlow
2.0 doesn’t
initialize
require variable
variables.
sharing via
scopes.
Example
g = tf.Graph() a = tf.constant([[10,10],
with g.as_default(): [11.,1.]])
a = tf.constant([[10,10],[11.,1.]])x = tf.constant([[1.,0.],[0.,1.]])
x = tf.constant([[1.,0.],[0.,1.]]) b = tf.Variable(12.)
b = tf.Variable(12.) y = tf.matmul(a, x) + b
y = tf.matmul(a, x) + b
print(y.numpy())
init_op =
tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(y))
Summary
• Machine Learning Basics
• Neuron
• Feed forward Network
• Gradient Descent
• Backpropagation Algorithm
• Challenges
• Tensorflow 2.0