0% found this document useful (0 votes)
45 views10 pages

Discussion 5 - Cross Entropy Loss - Annotated

The document discusses different loss functions used for classification models including softmax cross entropy loss and hinge loss. Softmax cross entropy loss interprets raw classifier scores as probabilities between 0 and 1 that sum to 1, and calculates the negative log likelihood of the true class prediction. Hinge loss defines zero loss if the correct class score is largest and has a clear margin over other classes, and non-zero loss otherwise. The document also reviews regularization, which prevents models from overfitting training data, and entropy from information theory used to define cross entropy loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views10 pages

Discussion 5 - Cross Entropy Loss - Annotated

The document discusses different loss functions used for classification models including softmax cross entropy loss and hinge loss. Softmax cross entropy loss interprets raw classifier scores as probabilities between 0 and 1 that sum to 1, and calculates the negative log likelihood of the true class prediction. Hinge loss defines zero loss if the correct class score is largest and has a clear margin over other classes, and non-zero loss otherwise. The document also reviews regularization, which prevents models from overfitting training data, and entropy from information theory used to define cross entropy loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Discussion 5 – Loss Function

Cross Entropy Loss


Recap: SVM Loss Multiclass SVM Loss
Loss
Cat 1.3
is score for other classes
Car 4.9
Frog 2.0 is score for correct class

Hinge Loss
𝐿𝑖 = ∑
𝑗 ≠ 𝑦𝑖 { 0∧  if 𝑠 𝑦 ≥ 𝑠 𝑗 +1
𝑖

𝑠 𝑗 − 𝑠 𝑦 +1∧  otherwise
𝑖

Cat 3.2
𝑆𝑗 𝑆𝑦 𝑖
Car 5.1
1

𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
Frog -1.7 score amongst
other classes
𝑖

Square Hinge Loss


𝑗 ≠ 𝑦𝑖
Cat 2.2
Car 2.5
Frog -3.1
Recap: Regularization
𝑁
1
𝐿 ( 𝑊 )= ∑ 𝐿𝑖 (¿ 𝑓 ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )¿
+ 𝜆 𝑅(𝑊 )
𝑁 𝑖=1

Data loss: Model predictions Regularization: Prevent the model


should match training data from doing too well on training data
Recap: Regularization
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln ⁡𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

Softmax Function 𝐿𝑖 =− ln ⁡
𝑠𝑘 ∑ 𝑒
𝑠 𝑗

𝑒 𝑗
𝑠𝑘
Cat 3.2
𝑒 ∑𝑒 𝑠𝑗

negative
𝑗
Car 5.1 24.53 0.13 log-likelihood
Frog -1.7 exp 164.02 norm 0.869 𝐿𝑖 =− ln ⁡(0.13)=2.04
0.18 0.001

unnormalized normalized prob


probability (sum to 1) Slide Credit : Stanford CS231n
Entropy: Information Theory 𝑃( 𝑦)
𝐷 𝐾𝐿 ( 𝑃∨¿𝑄 )=∑ 𝑃 ( 𝑦 ) ln KL Divergence
Measure of Uncertainty 𝑄 (𝑦)
H(Q) HH TT HT TH H(P) HH TT HT TH
Probability 0.25 0.25 0.25 0.25 Probability 0.5 0.25 0.125 0.125
Bit representation 00 01 10 11 Bit representation 1 01 000 001
No. of Bits 2 2 2 2 No. of Bits 1 2 3 3
−l o g 2 ⁡Q −l o g 2 ⁡P
Entropy, 0.5 0.5 0.5 0.5 Entropy, 0.5 0.5 0.375 0.375
−Q l o g 2 ⁡Q Expected Entropy 2 − P l o g 2 ⁡P Expected Entropy 1.75
−∑Q l o g 2 ⁡Q −∑ P l o g 2 ⁡P

H(P, Q)
Predicted Probability
HH
0.25
TT
0.25
HT
0.25
TH
0.25
Entropy
𝐻 ( 𝑃 )=− ∑ 𝑃 ( 𝑦 )  ln𝑃(𝑦)
Bit representation 00 01 10 11
No. of Bits 2 2 2 2
−l o g 2 ⁡Q
𝐻 ( 𝑃,𝑄 )=− ∑ 𝑃 ( 𝑦 )  ln𝑄(𝑦) Cross entropy
Entropy, 1 0.5 0. 25 0. 25
− P l o g 2 ⁡Q Expected Entropy 2
−∑ P l o g 2 ⁡Q
Cross Entropy Loss: Softmax Classifier
Want to interpret raw classifier scores as probabilities
𝑠𝑘
𝑒
𝑠= 𝑓 (𝑥 ,𝑊 ) 𝑃 ( 𝑌 =𝑘| 𝑋 = 𝑥 𝑖 )= 𝐿𝑖 =− ln 𝑃 ( 𝑌 =𝑦 𝑖 )
∑𝑒 𝑠𝑗

𝑗 𝑠𝑦
𝑠𝑘 𝑒 𝑖

𝐿𝑖 =− ln ⁡
𝑒
𝑠𝑘 ∑𝑒 𝑠𝑗

𝑒
𝑠𝑘
∑𝑒 𝑠𝑗
𝐻 (𝑃 ,𝑄)
Cat 3.2 𝑗

Car 5.1 24.53 0.13 compare 1


exp 164.02 norm
− ∑ 𝑃 ( 𝑦 ) ln𝑄 ( 𝑦 )
Frog -1.7 0.869 0
0.18 0.001 0
unnormalize Q 𝑥 P
d probability Slide Credit : Stanford CS231n
Hinge Loss (SVM)
Softmax vs. SVM -2.85

0.86

-15 0.28
0.01 -0.05 0.1 0.05
0.0

0.7 0.2 0.05 0.16


22
+¿ 0.2
Cross Entropy Loss (SoftMax)
0.0 -0.45 -0.2 0.03 -44 -0.3
-2.85 0.058 0.016
𝑊 56 𝑏
0.86 2.36 0.631
𝑥
0.28 1.32 0.353
Slide Credit : Stanford CS231n
SVM vs. Softmax
𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 )
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
𝑖 ∑ 𝑒𝑠 𝑗

𝑗 ≠ 𝑦𝑖 𝑗

Zero loss if the correct class score is Convert raw scores as probability
• largest among class score • Between 0 and 1
• clearly separable by a large margin • Scores of all classes sum up to 1
Non-zero loss if the correct class is Cross Entropy of the predicted
• not the largest probability and true probability
• not clearly separable distribution,
• negative log likelihood of true class
prediction
Slide Credit : Stanford CS231n
Recap
Dataset of (x,y)
A score function: regularization loss
A loss function:
𝑠𝑦
𝑒 𝑖

𝐿𝑖 =− ln
Softmax: ∑ 𝑒 𝑠 𝑗
𝑓 (𝑥𝑖 , 𝑊 )
data loss
𝐿
𝑗

SVM: 𝐿𝑖 = ∑ 𝑚𝑎𝑥 ( 0, 𝑠 𝑗 − 𝑠 𝑦 +1 ) 𝑖
𝑗 ≠ 𝑦𝑖
𝑁
1
Full loss: 𝐿= ∑ 𝐿𝑖 +𝑅(𝑊 )
𝑁 𝑖=1 Slide Credit : Stanford CS231n

You might also like