0% found this document useful (0 votes)

45 views10 pages

Discussion 5 - Cross Entropy Loss - Annotated

The document discusses different loss functions used for classification models including softmax cross entropy loss and hinge loss. Softmax cross entropy loss interprets raw classifier scores as probabilities between 0 and 1 that sum to 1, and calculates the negative log likelihood of the true class prediction. Hinge loss defines zero loss if the correct class score is largest and has a clear margin over other classes, and non-zero loss otherwise. The document also reviews regularization, which prevents models from overfitting training data, and entropy from information theory used to define cross entropy loss.

Uploaded by

Thinkers Institute

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views10 pages

Discussion 5 - Cross Entropy Loss - Annotated

Uploaded by

Thinkers Institute

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Discussion 5 – Loss Function

Cross Entropy Loss

Recap: SVM Loss Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class

Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧ if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖

𝑠 𝑗 − 𝑠 𝑦 +1∧ otherwise
𝑖

Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1

𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖

Square Hinge Loss

𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Recap: Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1

Data loss: Model predictions Regularization: Prevent the model

should match training data from doing too well on training data
Recap: Regularization
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln ⁡𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

Softmax Function 𝐿𝑖 =− ln ⁡
𝑠𝑘 ∑ 𝑒
𝑠 𝑗

𝑒 𝑗
𝑠𝑘
Cat 3.2
𝑒 ∑𝑒 𝑠𝑗

negative
𝑗
Car 5.1 24.53 0.13 log-likelihood
Frog -1.7 exp 164.02 norm 0.869 𝐿𝑖 =− ln ⁡(0.13)=2.04
0.18 0.001

unnormalized normalized prob

probability (sum to 1) Slide Credit : Stanford CS231n
Entropy: Information Theory 𝑃( 𝑦)
𝐷 𝐾𝐿 ( 𝑃∨¿𝑄 )=∑ 𝑃 ( 𝑦 ) ln KL Divergence
Measure of Uncertainty 𝑄 (𝑦)
H(Q) HH TT HT TH H(P) HH TT HT TH
Probability 0.25 0.25 0.25 0.25 Probability 0.5 0.25 0.125 0.125
Bit representation 00 01 10 11 Bit representation 1 01 000 001
No. of Bits 2 2 2 2 No. of Bits 1 2 3 3
−l o g 2 ⁡Q −l o g 2 ⁡P
Entropy, 0.5 0.5 0.5 0.5 Entropy, 0.5 0.5 0.375 0.375
−Q l o g 2 ⁡Q Expected Entropy 2 − P l o g 2 ⁡P Expected Entropy 1.75
−∑Q l o g 2 ⁡Q −∑ P l o g 2 ⁡P

H(P, Q)
Predicted Probability
HH
0.25
TT
0.25
HT
0.25
TH
0.25
Entropy
𝐻 ( 𝑃 )=− ∑ 𝑃 ( 𝑦 ) ln𝑃(𝑦)
Bit representation 00 01 10 11
No. of Bits 2 2 2 2
−l o g 2 ⁡Q
𝐻 ( 𝑃,𝑄 )=− ∑ 𝑃 ( 𝑦 ) ln𝑄(𝑦) Cross entropy
Entropy, 1 0.5 0. 25 0. 25
− P l o g 2 ⁡Q Expected Entropy 2
−∑ P l o g 2 ⁡Q
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln 𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

𝐿𝑖 =− ln ⁡
𝑒
𝑠𝑘 ∑𝑒 𝑠𝑗

𝑒
𝑠𝑘
∑𝑒 𝑠𝑗
𝐻 (𝑃 ,𝑄)
Cat 3.2 𝑗

Car 5.1 24.53 0.13 compare 1

exp 164.02 norm
− ∑ 𝑃 ( 𝑦 ) ln𝑄 ( 𝑦 )
Frog -1.7 0.869 0
0.18 0.001 0
unnormalize Q 𝑥 P
d probability Slide Credit : Stanford CS231n
Hinge Loss (SVM)
Softmax vs. SVM -2.85

0.86

-15 0.28
0.01 -0.05 0.1 0.05
0.0

0.7 0.2 0.05 0.16

22
+¿ 0.2
Cross Entropy Loss (SoftMax)
0.0 -0.45 -0.2 0.03 -44 -0.3
-2.85 0.058 0.016
𝑊 56 𝑏
0.86 2.36 0.631
𝑥
0.28 1.32 0.353
Slide Credit : Stanford CS231n
SVM vs. Softmax
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
𝑖 ∑ 𝑒𝑠 𝑗

𝑗 ≠ 𝑦𝑖 𝑗

Zero loss if the correct class score is Convert raw scores as probability
• largest among class score • Between 0 and 1
• clearly separable by a large margin • Scores of all classes sum up to 1
Non-zero loss if the correct class is Cross Entropy of the predicted
• not the largest probability and true probability
• not clearly separable distribution,
• negative log likelihood of true class
prediction
Slide Credit : Stanford CS231n
Recap
Dataset of (x,y)
A score function: regularization loss
A loss function:
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
Softmax: ∑ 𝑒 𝑠 𝑗
𝑓 (𝑥𝑖 , 𝑊 )
data loss
𝐿
𝑗

SVM: 𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 ) 𝑖
𝑗 ≠ 𝑦𝑖
𝑁
1
Full loss: 𝐿= ∑ 𝐿𝑖 +𝑅(𝑊 )
𝑁 𝑖=1 Slide Credit : Stanford CS231n

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
100% (1)
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
949 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
G N O M - A Program Package For Small-Angle Scattering Data Processing
No ratings yet
G N O M - A Program Package For Small-Angle Scattering Data Processing
4 pages
SMAI Assignment 7 Report - 20161204 PDF
No ratings yet
SMAI Assignment 7 Report - 20161204 PDF
6 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
3 pages
Machine Learning Technique For The Prediction of Blended Concrete Compressive Strength
No ratings yet
Machine Learning Technique For The Prediction of Blended Concrete Compressive Strength
19 pages
4-1 Data Science Syllabus
No ratings yet
4-1 Data Science Syllabus
7 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Paper 3
No ratings yet
Paper 3
7 pages
The Art of Fine-Tuning Large Language Models Explained in Depth
No ratings yet
The Art of Fine-Tuning Large Language Models Explained in Depth
15 pages
Introduction To Inverse Problems in Imaging, 2nd Edition Bertero Download
No ratings yet
Introduction To Inverse Problems in Imaging, 2nd Edition Bertero Download
46 pages
AI Roadmap - Based On Berkeley AI Graduate Certificate
No ratings yet
AI Roadmap - Based On Berkeley AI Graduate Certificate
23 pages
Regression
No ratings yet
Regression
7 pages
Enhancing Android Malware Detection Throught Ensemble Stakcking
No ratings yet
Enhancing Android Malware Detection Throught Ensemble Stakcking
11 pages
A Unified Approach To Interpreting Model Predictions
No ratings yet
A Unified Approach To Interpreting Model Predictions
9 pages
Distracted Driver Detection Using Deep Learning Methods
No ratings yet
Distracted Driver Detection Using Deep Learning Methods
12 pages
Classification Algorithms II
No ratings yet
Classification Algorithms II
9 pages
Sample Proposal Presentation
No ratings yet
Sample Proposal Presentation
12 pages
First-Order System Least Squares and Electrical Impedance Tomography
No ratings yet
First-Order System Least Squares and Electrical Impedance Tomography
24 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
A P P T D I: Daptive Runing of Retrained Ransformer VIA Ifferential Nclusions
No ratings yet
A P P T D I: Daptive Runing of Retrained Ransformer VIA Ifferential Nclusions
24 pages
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
No ratings yet
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
13 pages
Big Data Analytics and Intelligent Techniques For Smart - Kolla Bhanu Prakash (Editor), Janmenjoy Nayak (Editor), B TP - 1, 2021 - CRC Press, - 9780367753559 - Anna's Archive
No ratings yet
Big Data Analytics and Intelligent Techniques For Smart - Kolla Bhanu Prakash (Editor), Janmenjoy Nayak (Editor), B TP - 1, 2021 - CRC Press, - 9780367753559 - Anna's Archive
297 pages
Intraday Market Preditability. A Machine Learning Approach
No ratings yet
Intraday Market Preditability. A Machine Learning Approach
56 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Al3451 ML - Questionbank - 3,4,5
No ratings yet
Al3451 ML - Questionbank - 3,4,5
11 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
Lundberg, Lee - 2017 - A Unified Approach To Interpreting Model Predictions (2) - Annotated
No ratings yet
Lundberg, Lee - 2017 - A Unified Approach To Interpreting Model Predictions (2) - Annotated
11 pages

Discussion 5 - Cross Entropy Loss - Annotated

Uploaded by

Discussion 5 - Cross Entropy Loss - Annotated

Uploaded by

Discussion 5 – Loss Function

Cross Entropy Loss

Square Hinge Loss

Data loss: Model predictions Regularization: Prevent the model

unnormalized normalized prob

Car 5.1 24.53 0.13 compare 1

0.7 0.2 0.05 0.16

You might also like