2 Softmaxregression

This document provides an introduction to softmax regression, including how it computes class probabilities and scores, its loss function, and the F1 score metric. Softmax regression first calculates scores for each class and then estimates class probabilities using the softmax function. It aims to maximize the probability of the true class by minimizing the cross entropy loss. The F1 score considers both precision and recall to evaluate classification performance, especially for imbalanced datasets.

Uploaded by

Hidden character

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

2 Softmaxregression

Uploaded by

Hidden character

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Chapter 1

Classification using Softmax

Regression

Softmax regression, or multinomial logistic regression, is a widely used algorithm for multiclass
classification. This chapter provides an introduction to the basic concepts of softmax regression.

Figure 1.1: Logistic Regression vs Softmax Regression

1.1. Softmax Classifier

When given an instance x, the Softmax Regression model first computes a score sk (x) for each
class k, then estimates the probability of each class by applying the softmax function (also called
the normalized exponential) to the scores [1]:

sk (x) = xT θk

Each class has its own dedicated parameter vector θk . All these vectors are typically stored as
rows in a parameter matrix Θ.
Once the score of every class for the instance x is calculated, we can estimate the probability p̂k
that the instance belongs to class k by running the scores through the softmax function [1]:

exp(sk (x))
p̂k = σk (s(x)) = K
P
exp(sj (x))
j=1

In the above equation:

• K is the number of classes.

1
• s(x) is a vector containing the scores of each class for the instance x.
• σk (s(x)) is the estimated probability that the instance x belongs to class k, given the scores
of each class for that instance.
Just like the Logistic Regression, the Softmax Regression predicts the class with the highest
estimated probability (which is simply the class with the highest score), as shown in the equation
below [1]:
ŷ = arg max σk (s(x)) = arg max sk (x)
k k
If the dataset is two dimensional, Softmax Regression partitions the plane into multiple polygonal
regions using lines, as depicted in figure 1.1.

1.2. Loss function

When dealing with percentages, the function f (x) = −log(x) is commonly used to quantify the
proximity of the current percentage to 1.

Figure 1.2: − log(x) graph

The objective of training is to have a model that estimates a high probability for the target
class (and consequently a low probability for the other classes). Minimizing the cost function
below, called the cross entropy, should lead to this objective because it penalizes the model when
it estimates a low probability for a target class. Cross entropy is frequently used to measure how
well a set of estimated class probabilities matches the target classes [1].
m K
1 X X (i) (i)
J(Θ) = − yk log(p̂k )
m i=1
k=1

(i)
In the above equation, yk is the target probability that the ith instance belongs to class k. In
general, it is either equal to 1 or 0, depending on whether the instance belongs to the class or
not. When there are just two classes (K = 2), this cost function is equivalent to the Logistic
Regression’s cost function (logloss).
In practice, it is common to apply regularization techniques to the loss function in order to
penalize large values of θ. This is necessary to prevent the model from becoming overly sensitive
to new data. We use a technique known as the Elastic Net, the Elastic Net attempts to minimize:
K
X K
X
L(Θ) = J(Θ) + λ1 ∥θ(i) ∥ + λ2 ∥θ(i) ∥2
i=1 i=1

Since J(Θ) is a convex function [2], L(Θ) also a convex function, and therefore has a unique
minimum which make Gradient Descent (or any other optimization algorithms) guarenteed to
approach the global minimum.

2
1.3. F1 score
The F1 score is often considered a better metric than normal accuracy score to measure model
classification performance, especially in scenarios where class imbalance exists. Accuracy is a
metric that measures the overall correctness of predictions by dividing the number of correct
predictions by the total number of predictions. As a consequence, if a dataset has 95% of
instances belonging to class A and only 5% belonging to class B, a naive classifier that always
predicts class A would achieve 95% accuracy. This classifier completely fails to identify instances
from class B, which may be the class of interest. The F1 score, on the other hand, takes into
account both precision and recall. In a binary classification problem, F1 score is calculated by
using the concept of true positive (TP), false negative (FN) and false positive (FP):

TP
Precesion =
TP + FP
TP
Recall =
TP + FN
2 × Precision × Recall
F1 score =
Precision + Recall
In multi-class classification, F1 score is calculated for each class separately in one-vs-rest
manner [3].

3
References

[1] A. G. Sager, Hands-On Machine Learning with Python and TensorFlow. O’Reilly Media.
[2] JoramSoch, “Proof: Convexity of the cross-entropy,” https://statproofbook.github.io/P/
entcross-conv.html, 2020, the Book of Statistical Proofs.
[3] Baeldung, “F-1 Score for Multi-Class Classification,” https://www.baeldung.com/cs/
multi-class-f1-score, Baeldung on Computer Science, 2022, accessed on February 14, 2024.

21 Emotion Regulation Worksheets & Strategies
No ratings yet
21 Emotion Regulation Worksheets & Strategies
17 pages
Slides MC Softmax Regression
No ratings yet
Slides MC Softmax Regression
11 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
F1 - Score
No ratings yet
F1 - Score
13 pages
Continuous Crystallizers
50% (2)
Continuous Crystallizers
22 pages
Observation Monitoring Checklist 2022 2023
No ratings yet
Observation Monitoring Checklist 2022 2023
9 pages
CM20315 05 Loss
No ratings yet
CM20315 05 Loss
100 pages
GP1 Q1 Week-1
No ratings yet
GP1 Q1 Week-1
18 pages
Lect 8
No ratings yet
Lect 8
117 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
ML Interview Questions Placements
No ratings yet
ML Interview Questions Placements
99 pages
7 TrainingNN-2
No ratings yet
7 TrainingNN-2
84 pages
Detailed Sigmoid and Softmax Activation Function
No ratings yet
Detailed Sigmoid and Softmax Activation Function
5 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Slide 2
No ratings yet
Slide 2
30 pages
ML Lec 4
No ratings yet
ML Lec 4
38 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Bản sao của softmax - regression.ipynb - Colab
No ratings yet
Bản sao của softmax - regression.ipynb - Colab
6 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
机器学习
No ratings yet
机器学习
41 pages
08 Logistic Regression
No ratings yet
08 Logistic Regression
19 pages
Information Securtiy
No ratings yet
Information Securtiy
8 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Cost Function Loss Function
No ratings yet
Cost Function Loss Function
7 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Logistic Regression
No ratings yet
Logistic Regression
29 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
9.b Handout-1-Loss Functions
No ratings yet
9.b Handout-1-Loss Functions
3 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Cross Entropy Loss Intro, Applications
No ratings yet
Cross Entropy Loss Intro, Applications
21 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Foundation General Information Tubular Towers Generic XXHZ EN r03 PDF
100% (2)
Foundation General Information Tubular Towers Generic XXHZ EN r03 PDF
69 pages
AI and Math - Python Multiple-Choice Questions
No ratings yet
AI and Math - Python Multiple-Choice Questions
16 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Solution 5
No ratings yet
Solution 5
4 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
South Africa Heart Disease Project: Omar M. Osama Deyaa Eldeen A. Almahallawi June 16, 2010
No ratings yet
South Africa Heart Disease Project: Omar M. Osama Deyaa Eldeen A. Almahallawi June 16, 2010
7 pages
The Impact of Academic Performance On Self-Esteem Among The Female Students Studying in Different Colleges Under Royal University of Bhutan
No ratings yet
The Impact of Academic Performance On Self-Esteem Among The Female Students Studying in Different Colleges Under Royal University of Bhutan
7 pages
Web Programming-II Course Outline
100% (1)
Web Programming-II Course Outline
4 pages
Nanjing Insulators Brochure
No ratings yet
Nanjing Insulators Brochure
33 pages
Dimensions of The Self
No ratings yet
Dimensions of The Self
19 pages
The Social Basis of A Fascist State
No ratings yet
The Social Basis of A Fascist State
4 pages
P802d3bj D2p2
No ratings yet
P802d3bj D2p2
356 pages
Letter of Inquiry
No ratings yet
Letter of Inquiry
8 pages
Strategies To Improve Memory
No ratings yet
Strategies To Improve Memory
14 pages
Ele Emag Sadiku Nelatury
No ratings yet
Ele Emag Sadiku Nelatury
3 pages
Basic GD&T - Datums
No ratings yet
Basic GD&T - Datums
32 pages
Test Report 221 - U-Clamp Connector - UT80B2
No ratings yet
Test Report 221 - U-Clamp Connector - UT80B2
15 pages
Manual Pce Ut603.pdfsf
No ratings yet
Manual Pce Ut603.pdfsf
14 pages
Panasonic (TX w32d4f) (TX w28d4f)
No ratings yet
Panasonic (TX w32d4f) (TX w28d4f)
50 pages
Lesson Exemplar English
No ratings yet
Lesson Exemplar English
5 pages
Raman Kumar,: Career Objective
No ratings yet
Raman Kumar,: Career Objective
3 pages
Zero Energy Building Concept
No ratings yet
Zero Energy Building Concept
11 pages
Employee Turnover Problem Statement
No ratings yet
Employee Turnover Problem Statement
5 pages
Gravimetric Analysis Part 1
No ratings yet
Gravimetric Analysis Part 1
18 pages
The Pioneers of Robotic Process Automation (RPA) Software
No ratings yet
The Pioneers of Robotic Process Automation (RPA) Software
2 pages
Assignment 1 Green Buildings
No ratings yet
Assignment 1 Green Buildings
3 pages
Report 2
No ratings yet
Report 2
19 pages
Decision Support System
No ratings yet
Decision Support System
31 pages
Artika
No ratings yet
Artika
13 pages
Application of Differential Equations - TEE
No ratings yet
Application of Differential Equations - TEE
2 pages
Unggayan (Chapter 9)
No ratings yet
Unggayan (Chapter 9)
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet