0% found this document useful (0 votes)
35 views28 pages

Lecture 2 - GD Linear Regression

The document discusses various activation functions used in deep learning neural networks. It defines eight different activation functions - sigmoid, tanh, ReLU, Leaky ReLU, Parametric ReLU, Exponential Linear Unit (ELU), softplus, and softmax. For each activation function, it provides the mathematical equation. It also includes questions about why activation functions are needed and how they should be defined for a layer or neuron.

Uploaded by

Robert Babayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views28 pages

Lecture 2 - GD Linear Regression

The document discusses various activation functions used in deep learning neural networks. It defines eight different activation functions - sigmoid, tanh, ReLU, Leaky ReLU, Parametric ReLU, Exponential Linear Unit (ELU), softplus, and softmax. For each activation function, it provides the mathematical equation. It also includes questions about why activation functions are needed and how they should be defined for a layer or neuron.

Uploaded by

Robert Babayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Deep Learning

Vazgen Mikayelyan

October 20, 2020

V. Mikayelyan Deep Learning October 20, 2020 1 / 15


Activation functions
1
1. Sigmoid: σ (x) =
1 + e −x

V. Mikayelyan Deep Learning October 20, 2020 2 / 15


Activation functions
1
1. Sigmoid: σ (x) =
1 + e −x

V. Mikayelyan Deep Learning October 20, 2020 2 / 15


Activation functions

e x − e −x
2. Tanh: tanh (x) =
e x + e −x

V. Mikayelyan Deep Learning October 20, 2020 3 / 15


Activation functions

e x − e −x
2. Tanh: tanh (x) =
e x + e −x

V. Mikayelyan Deep Learning October 20, 2020 3 / 15


Activation functions

3. Rectified linear unit: ReLU (x) = max (0, x)

V. Mikayelyan Deep Learning October 20, 2020 4 / 15


Activation functions

3. Rectified linear unit: ReLU (x) = max (0, x)

V. Mikayelyan Deep Learning October 20, 2020 4 / 15


Activation functions
(
0.01x, for x < 0
4. Leaky ReLU: LR (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 5 / 15


Activation functions
(
0.01x, for x < 0
4. Leaky ReLU: LR (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 5 / 15


Activation functions
(
ax, for x < 0
5. Parametric ReLU: PR (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 6 / 15


Activation functions
(
ax, for x < 0
5. Parametric ReLU: PR (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 6 / 15


Activation functions
(
a (e x − 1) , for x < 0
6. Exponential linear unit: ELU (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 7 / 15


Activation functions
(
a (e x − 1) , for x < 0
6. Exponential linear unit: ELU (x) =
x, for x ≥ 0

V. Mikayelyan Deep Learning October 20, 2020 7 / 15


Activation functions
7. SoftPlus: SP (x) = log (1 + e x )

V. Mikayelyan Deep Learning October 20, 2020 8 / 15


Activation functions
7. SoftPlus: SP (x) = log (1 + e x )

V. Mikayelyan Deep Learning October 20, 2020 8 / 15


Activation functions

 
 e x1 e x2 e xn 
8. Softmax: S (x1 , x2 , . . . , xn ) = 
Pn , n , . . . , n


x x x
P P
e i e i e i
i=1 i=1 i=1

V. Mikayelyan Deep Learning October 20, 2020 9 / 15


Questions

1 Why do we need activation functions?

V. Mikayelyan Deep Learning October 20, 2020 10 / 15


Questions

1 Why do we need activation functions?


2 How should we define activation functions, for a layer or for a neuron?

V. Mikayelyan Deep Learning October 20, 2020 10 / 15


Outline

1 Gradient Descent

2 Linear and Logistic Regressions

V. Mikayelyan Deep Learning October 20, 2020 11 / 15


Gradient Descent

Let f : Rk → R be a convex function and we want to find its global


minimum.

V. Mikayelyan Deep Learning October 20, 2020 12 / 15


Gradient Descent

Let f : Rk → R be a convex function and we want to find its global


minimum. This optimization algorithm is based on the fact that the fastest
decreasing direction of the function is the opposite direction of gradient:

xn+1 = xn − α∇f (xn )

and x0 ∈ Rk is a arbitrary point.

V. Mikayelyan Deep Learning October 20, 2020 12 / 15


Outline

1 Gradient Descent

2 Linear and Logistic Regressions

V. Mikayelyan Deep Learning October 20, 2020 13 / 15


Linear Regression

Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data.

V. Mikayelyan Deep Learning October 20, 2020 14 / 15


Linear Regression

Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function


 
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.

V. Mikayelyan Deep Learning October 20, 2020 14 / 15


Linear Regression

Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function


 
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.

Our aim is to find parameters b, w 1 , w 2 , . . . , w k such that

f (xi ) ≈ yi , i = 1, . . . , n.

V. Mikayelyan Deep Learning October 20, 2020 14 / 15


Linear Regression

Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function


 
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.

Our aim is to find parameters b, w 1 , w 2 , . . . , w k such that

f (xi ) ≈ yi , i = 1, . . . , n.

We choose L2 distance as our loss function:


n
1X
(f (xl ) − yl )2 .
n
l=1

V. Mikayelyan Deep Learning October 20, 2020 14 / 15


Questions

1 Should we minimize the loss function using gradient descent?

V. Mikayelyan Deep Learning October 20, 2020 15 / 15


Questions

1 Should we minimize the loss function using gradient descent?


2 Can you represent this model as a neural network?

V. Mikayelyan Deep Learning October 20, 2020 15 / 15

You might also like