0% found this document useful (0 votes)

20 views5 pages

Detailed Sigmoid and Softmax Activation Function

Uploaded by

shubhodippal01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

Detailed Sigmoid and Softmax Activation Function

Uploaded by

shubhodippal01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

10-315: Introduction to Machine Learning Recitation 3

1 Sigmoid Function and Logistic Regression

In lecture, we discussed sigmoid functions - a class of functions characterized by an ”S” shaped curve. This
”S” shape is especially useful in machine learning, where we are constantly finding gradients and relying on
our graphs being differentiable! Often times, as discussed in class, our choice of the sigmoid function is the
logistic function:

1
g(z) = (1)
1 + e−z

where z = mx + b

(a) Let’s do a quick concept check. In what situation would we use logistic regression instead of linear
regression?

Linear regression assumes the data follows a linear function, while logistic regression models the data
using a sigmoid function. We can also use logistic regression as a classification technique (when labels
are binary), while we use linear regression when we are predicting some linear function on our data.

(b) We see that g(z) falls strictly between (0, 1). Given what we have learned in class and discussed so far,
what probability distribution does this graph represent?

P (y = 1 | x), where y represents the output class.

1 1 e−z
So, P (y = 1 | x) = 1+e−z , and P (y = 0 | x) = 1 − 1+e−z = 1+e−z

 
1
(c) Now, let’s consider a R3 space. For weight vector θ = 4, define (i) some x such that θ T x > 0. What
3
is the resulting g(z)? Now, (ii) some x such that θ T x = 0. What is the resulting g(z)? Explain the
overall relationship between g(z) and θ T x.

There are multiple correct values of x. We consider one specific example and show that we are correct.

(i)
 
1
x = 1. So, θ T x = 8 > 0. The resulting g(z) is 1
1+ e18
. This value is extremely close to 1, but still
1
less than 1 (0.99967).

(ii)

1
10-315: Introduction to Machine Learning Recitation 3

 
7
x = −1. So, θ T x = 0. The resulting g(z) is 1
1+1 = 0.5.
−1
Overall, we see that our value of g(z) relies on if z >, <, = 0. If z > 0, then g(z) > 0.5. If z < 0, then
g(z) < 0.5. If z = 0, then g(z) = 0.5. Based on the value of g(z), we choose the appropriate binary
class. So, because z = θ T x, we see that θ T x = 0 represents our decision boundary, and when θ T x = 0,
g(z) = 0.5.

2
10-315: Introduction to Machine Learning Recitation 3

2 Multinomial Logistic Regression: Multi-class Classification

So far, we have seen how to use logistic regression to model a binary variable, such as pass/fail, healthy/sick,
etc. Now, what if we have k classes? How can we learn such a classifier?

In a K-class classification setting, we have training set D = {(x(i) , y(i) ) | i = 1, . . . , n} where x(i) ∈ RM
is a feature vector and y(i) ∈ {1, 2, ..., K}.

2.1 One-versus-All
Using a one-vs-all classifier (with logistic regression) to predict the label for a new data includes the following
two steps:

1. Learn one binary classifier for each class. For each 1 ≤ k ≤ K, treat samples of class k as positive
examples and samples from all other classes as negative samples. Perform logistic regression on this
dataset to learn:
1
p(y = k | x; W, b) =
1 + e−(wk T x+bk )

2. Majority Vote:
ŷ = argmax p(y (i) = 1 | x; W, b)
k

Note: this method can be used with any binary classifier (including binary logistic regression, binary SVM
classifier, etc).

2.2 Generalization of Logistic Regression: Softmax Regression

The Softmax Function

For a vector z = [z1 , z2 , ..., zK ]T ∈ RK , the softmax function outputs a vector of the same dimension,
sof tmax(z) ∈ RK , where each of its entries is defined as:

ezk
sof tmax(z)k = PK , for all k = 1, 2, ..., K
c=1 ezc

which guarantees two things:

• Each entry of the resulting vector sof tmax(z) is a value in the range (0, 1)
PK
• k=1 sof tmax(z)k = 1

Therefore, the softmax function is useful for converting a vector of arbitrary real numbers into a discrete
probability density consisting of K probabilities proportional to the exponentials of the input vector compo-
nents. Note that, for example, the larger input components correspond to larger probabilities.

Softmax is often used as the last layer of a neural networks, to map the non-normalized output of a
network to a probability distribution over predicted output classes.

3
10-315: Introduction to Machine Learning Recitation 3

Figure 1: Procedure of Softmax Regression with 3-dimensional Features

Softmax Regression

For K-class classification, Softmax Regression has a parametric model of the form:

exp(wkT x(i) + bk )
p(y (i) = k | x(i) ; W, b) = PK . (2)
T (i) + b )
c=1 exp(wc x c

Therefore, the output of the softmax model looks like: ŷ = argmaxk p(y (i) = k | x(i) ; W, b)
The intermediate result (a vector) outputted by the softmax function is:

p(y (i) = 1 | x(i) ; W) exp(w1T x(i) + b1 )

   
 p(y (i) = 2 | x(i) ; W)  1  exp(w2T x(i) + b2 ) 
=
   
 ..  PK 
T x(i) + b ) 
.. 
 .  exp(w
k=1 k k . 
p(y (i) = K | x(i) ; W) T (i)
exp(wK x + bk )

Note: now W is a matrix! Let W be the M × K matrix obtained by concatenating w1 , w2 ,...,wK , where
each wi ∈ RM .

4
10-315: Introduction to Machine Learning Recitation 3

1. Exercise: Relationship to Logistic Regression

In the special case where K = 2 , one can show that softmax regression reduces to logistic regression.
This shows that softmax regression is a generalization of logistic regression.
Specifically, show the equivalence between the two equations (2) and (3).

exp(wkT x(i) + bk )
p(y (i) = k | x(i) ; W, b) = P2 . (3)
T (i) + b )
c=1 exp(wc x k

1

1+exp(−(w1T x(i) +b1 ))
, if k = 1
= 1 exp(−(w1T x(i) +b1 )) (4)
1 − = , if k = 2
1+exp(−(wT x(i) +b1 ))
1 1+exp(−(w1T x(i) +b1 ))

Note that the softmax model contains two ”sets” of weights, whereas the logistic regression output
only contains one ”set” of weight. Therefore, this simple example with K = 2 not only shows that
softmax regression is a generalization of logistic regression, but also shows that softmax regression has
a ”redundant” set of parameters.

Note: the intermediate output vector of the softmax function is:

(i)
p(y = 1 | x(i) ; W, b) exp(w1T x(i) + b1 )

1
=
p(y (i) = 2 | x(i) ; W, b) exp(w1T x(i) + b1 ) + exp(w2T x(i) + b2 ) exp(w2T x(i) + b2 )

exp(w1T x(i) + b1 )
p(y (i) = 1 | x(i) ; W, b) =
exp(w1T x(i) + b1 ) + exp(w2T x(i) + b2 )
exp(w1T x(i) + b1 )/ exp(w1T x(i) + b1 )
=
(exp(w1T x(i) + b1 ) + exp(w2T x(i) + b2 ))/ exp(w1T x(i) + b1 )
exp((w1T − w1T )x(i) + (b1 − b1 ))
=
exp((w1T − w1T )x(i) + (b1 − b1 )) + exp((w2T − w1T )x(i) + (b2 − b1 ))
1
=
1 + exp((w2T − w1T )x(i) + (b2 − b1 ))
1
= T x(i) + β))
, where wα = −(w2 − w1 ), β = −(b2 − b1 )
1 + exp(−(wα

(i)
p(y (i) = 2 | x(i) ; W, b) = 1 − p(y1 = 1 | x(i) ; w)
1
=1− T x(i) + β))
1 + exp(−(wα
T (i)
exp(−(wα x + β))
= T x(i) + β))
, where wα = −(w2 − w1 ), β = −(b2 − b1 )
1 + exp(−(wα

AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Classification
No ratings yet
Classification
31 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lec 4
No ratings yet
Lec 4
24 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Logistic Regression
No ratings yet
Logistic Regression
29 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Slide 2
No ratings yet
Slide 2
30 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Slides MC Softmax Regression
No ratings yet
Slides MC Softmax Regression
11 pages
Lecture 06 - Multiclass Logistic Regression
No ratings yet
Lecture 06 - Multiclass Logistic Regression
12 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Lec 20
No ratings yet
Lec 20
16 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Mod 1
No ratings yet
Mod 1
99 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
9-Richtmyer - Principles of Advanced Mathematical Physics II
100% (4)
9-Richtmyer - Principles of Advanced Mathematical Physics II
332 pages
Bản sao của softmax - regression.ipynb - Colab
No ratings yet
Bản sao của softmax - regression.ipynb - Colab
6 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Log Reg Skimed - Ipynb - Colab
No ratings yet
Log Reg Skimed - Ipynb - Colab
10 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Neural Network
No ratings yet
Neural Network
14 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Softmax For The Layman
100% (1)
Softmax For The Layman
10 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
South Africa Heart Disease Project: Omar M. Osama Deyaa Eldeen A. Almahallawi June 16, 2010
No ratings yet
South Africa Heart Disease Project: Omar M. Osama Deyaa Eldeen A. Almahallawi June 16, 2010
7 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Lecture Notes On Algebra - Liberty Uni
No ratings yet
Lecture Notes On Algebra - Liberty Uni
354 pages
2 Softmaxregression
No ratings yet
2 Softmaxregression
4 pages
Summary Notes of CNN
No ratings yet
Summary Notes of CNN
23 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
James-Kerber - Representation Theory of Symmetric Groups
100% (3)
James-Kerber - Representation Theory of Symmetric Groups
532 pages
An Introduction To Numerical Methods For The Solutions of Partial Differential Equations
No ratings yet
An Introduction To Numerical Methods For The Solutions of Partial Differential Equations
12 pages
Mathematics l5 Marking Guides
100% (1)
Mathematics l5 Marking Guides
8 pages
Goldstein Classical Mechanics Notes
0% (3)
Goldstein Classical Mechanics Notes
7 pages
Math 8 Lesson 11 System of Linear Equations
No ratings yet
Math 8 Lesson 11 System of Linear Equations
44 pages
Part 8 - Confusion Matrix
No ratings yet
Part 8 - Confusion Matrix
21 pages
Calculus Essay Writing
No ratings yet
Calculus Essay Writing
4 pages
Euler Lagrange EQ Made Simple Reany p3
100% (1)
Euler Lagrange EQ Made Simple Reany p3
3 pages
LP Chapter 3 Lesson 3 Sampling Without Replacement
No ratings yet
LP Chapter 3 Lesson 3 Sampling Without Replacement
2 pages
Nonlinear Finite Element Analysis of Shells: Part I. Threedimensional Shells
No ratings yet
Nonlinear Finite Element Analysis of Shells: Part I. Threedimensional Shells
32 pages
AIML Lec-3
No ratings yet
AIML Lec-3
16 pages
Schaumcalculus p2
No ratings yet
Schaumcalculus p2
131 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
The Riemann-Steiltjes Integral
No ratings yet
The Riemann-Steiltjes Integral
15 pages
Ma 4121: Introduction To Lebesgue Integration Solutions To Homework Assignment 6
No ratings yet
Ma 4121: Introduction To Lebesgue Integration Solutions To Homework Assignment 6
7 pages
AIML Lec-7
No ratings yet
AIML Lec-7
24 pages
Factoring: Math 8 Teacher Jervy Josiah D. Bayang
No ratings yet
Factoring: Math 8 Teacher Jervy Josiah D. Bayang
23 pages
AIML Lec-1
No ratings yet
AIML Lec-1
15 pages
Prob Set 5 - Vertical Curves
No ratings yet
Prob Set 5 - Vertical Curves
1 page
08 Plasticity 02 Stress Analysis
No ratings yet
08 Plasticity 02 Stress Analysis
7 pages
Math 141
No ratings yet
Math 141
4 pages
MA 214: Introduction To Numerical Analysis: Shripad M. Garge. IIT Bombay (Shripad@math - Iitb.ac - In)
No ratings yet
MA 214: Introduction To Numerical Analysis: Shripad M. Garge. IIT Bombay (Shripad@math - Iitb.ac - In)
71 pages
L5 Asmptotic Notations
No ratings yet
L5 Asmptotic Notations
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
ML Notes 1
No ratings yet
ML Notes 1
13 pages
AIML Lec-14
No ratings yet
AIML Lec-14
12 pages
Newton-Euler Equations in General Coordinates
No ratings yet
Newton-Euler Equations in General Coordinates
8 pages
Amrein 300 306
No ratings yet
Amrein 300 306
7 pages
Convolution Over Volumes
No ratings yet
Convolution Over Volumes
9 pages
UNIT 4.1: Functions of Several Variables
No ratings yet
UNIT 4.1: Functions of Several Variables
29 pages
Chapter 6 State Space Analysis
No ratings yet
Chapter 6 State Space Analysis
31 pages
G12, Math, L 3-4 Diffrentiablity C
No ratings yet
G12, Math, L 3-4 Diffrentiablity C
2 pages
SYLL
No ratings yet
SYLL
2 pages
Beirut Arab University: CVLE 210 Statics
No ratings yet
Beirut Arab University: CVLE 210 Statics
3 pages
AP Calculus 2
No ratings yet
AP Calculus 2
1 page
174 - Integers and Powers
No ratings yet
174 - Integers and Powers
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Detailed Sigmoid and Softmax Activation Function

Uploaded by

Detailed Sigmoid and Softmax Activation Function

Uploaded by

10-315: Introduction to Machine Learning Recitation 3

1 Sigmoid Function and Logistic Regression

P (y = 1 | x), where y represents the output class.

2 Multinomial Logistic Regression: Multi-class Classification

2.2 Generalization of Logistic Regression: Softmax Regression

which guarantees two things:

Figure 1: Procedure of Softmax Regression with 3-dimensional Features

p(y (i) = 1 | x(i) ; W) exp(w1T x(i) + b1 )

1. Exercise: Relationship to Logistic Regression

Note: the intermediate output vector of the softmax function is:

You might also like