0% found this document useful (0 votes)

9 views

3 LogisticRegression

Uploaded by

João Paulo Dellasta do Nascimento

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

3 LogisticRegression

Uploaded by

João Paulo Dellasta do Nascimento

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

ITCS 6156/8156 Fall 2023

Machine Learning

Logistic Regression

Instructor: Hongfei Xue

Email: [email protected]
Class Meeting: Mon & Wed, 4:00 PM – 5:15 PM, CHHS 376

Some content in the slides is based on Dr. Razvan’s and Dr. Andrew’s lectures
Logistic Regression

Linear
Regression
Regression

Supervised
Types of Output
Learning

Logistic
Classification
Regression
Classification

Question Answer "y"

Is this email spam? no yes
Is the transaction fraudulent? no yes
Is the tumor malignant? no yes

• binary classification:
• “y” can only be one of two values:
- false: 0: "negative class" = “absence”
- true: 1: "positive class” = “presence”
Linear Regression Approach

𝑓!,# 𝑥 = 𝑤𝑥 + 𝑏
(Yes) 1

𝑏 = 𝑤$
Threshold 0.5 𝑦! = 1
malignant? 𝑦! = 0

(No) 0
tumor size 𝑥
(diameter in cm)

if 𝑓!,# 𝑥 < 0.5 → 𝑦( = 0

if 𝑓!,# 𝑥 ≥ 0.5 → 𝑦( = 1
Linear Regression Approach

𝑓!,# 𝑥 = 𝑤𝑥 + 𝑏
(Yes) 1

Threshold 0.5 𝑦! = 1
malignant? 𝑦! = 0

(No) 0
tumor size 𝑥
(diameter in cm)

if 𝑓!,# 𝑥 < 0.5 → 𝑦( = 0

if 𝑓!,# 𝑥 ≥ 0.5 → 𝑦( = 1
Logistic Function

(Yes) 1

Threshold 0.7
malignant?

(No) 0
tumor size 𝑥
(diameter in cm)

Probabilistic Discriminative Models: directly model the

posterior class probabilities 𝑝(𝐶|𝐱; 𝒘, 𝑏)
Logistic Function
• Want outputs between 0 and 1 𝑤+𝑥+𝑏
↓
1 𝑧
1 ↓
𝑔 𝑧 = 1
1 + 𝑒 $% 𝑔 𝑧 =
1 + 𝑒 $%
0.5
𝑓&,( 𝑥 = 𝑔 𝑤 + 𝑥 + 𝑏
1
0 =
-3 3 1 + 𝑒 $(&*+"( )
logistic regression
• sigmoid function
• logistic function
!
• outputs between 0 and 1 𝑔 𝑧 = ,0 <𝑔 𝑧 <1
!"# !"
Decision Boundary

𝑥-
3
𝑓&,( 𝑥 = 𝑔 𝑧 = 𝑔(𝑤!𝑥! + 𝑤-𝑥- + 𝑏)

2
Decision Boundary: 𝑧 = 𝑤 + 𝑥 + 𝑏 = 0
1 (set 𝑤! = 1, 𝑤- = 1 )
𝑧 = 𝑥! + 𝑥- − 3 = 0
0 𝑥! + 𝑥- = 3
1 2 3 𝑥!

Decision boundary is hyperplane 𝑓 𝑥 = 0.5 → 𝑧 = 0

Non-linear Decision Boundary

𝑥-

1 𝑧 = 𝑤$ 𝑥$% + 𝑤% 𝑥%% + 𝑏

-1 1 𝑥!

-1 Decision Boundary:
(set 𝑤! = 1, 𝑤- = 1 )
𝑧 = 𝑥!- + 𝑥-- − 1 = 0
𝑥!- + 𝑥-- = 1
Loss Function

Training Set
tumor size(cm) … patient’s age malignant?
𝑥! 𝑥" 𝑦 𝑖 = 1,2, ⋯ 𝑚: number of training
samples
10 52 1
𝑗 = 1,2, ⋯ 𝑛: number of features
2 73 0 target 𝑦 is 0 or 1
5 55 0
12 49 1
… … …

1
𝑓!,# 𝑥 =
1 + 𝑒 $(!&'(# )
How to choose 𝑤 = [𝑤* , 𝑤+ , 𝑤, , ⋯ 𝑤- ] and 𝑏?
Loss Function

• Squared Error Cost:

! 9 !
𝐽 𝑤, 𝑏 = ∑ ( 𝑓&,( 𝑥⃗ (:) − 𝑦 : )-
9 :;! -

- Differentiable => can use gradient descent

- Non-convex => not guaranteed to find the global optimum
Loss Function

• Logistic Loss Function:

−log(𝑓!,# 𝑥⃗ . ), 𝑖𝑓 𝑦 . =1
𝐿 𝑓!,# 𝑥⃗ . ,𝑦 . =6
− log 1 − 𝑓!,# 𝑥⃗ . 𝑖𝑓 𝑦 . =0

𝑖𝑓 𝑦 # = 1, As 𝑓$,& 𝑥⃗ # → 1, then loss → 0

As 𝑓$,& 𝑥⃗ # → 0, then loss → ∞

#
𝑖𝑓 𝑦 = 0, As 𝑓$,& 𝑥⃗ # → 1, then loss → ∞
As 𝑓$,& 𝑥⃗ # → 0, then loss → 0
Simplified Loss Function

• Logistic Loss Function:

−log(𝑓!,# 𝑥⃗ . ), 𝑖𝑓 𝑦 . =1
𝐿 𝑓!,# 𝑥⃗ . ,𝑦 . =6
− log 1 − 𝑓!,# 𝑥⃗ . 𝑖𝑓 𝑦 . =0

• Simplified Logistic Loss Function (Convex):

𝐿 𝑓!,# 𝑥⃗ . ,𝑦 . = −𝑦 . log 𝑓!,# 𝑥⃗ . − (1 − 𝑦 . ) log 1 − 𝑓!,# 𝑥⃗ .

• Overall:
* /
𝐽 𝑤, 𝑏 = ∑.0*[𝐿 𝑓!,# 𝑥⃗ . , 𝑦 . ]
/
*
= − ∑/ .0*[𝑦 . log 𝑓!,# 𝑥⃗ . + (1 − 𝑦 . ) log 1 − 𝑓!,# 𝑥⃗ . ]
/

Can be derived from Maximum Likelihood

Gradient Descent
• Overall Loss (Cost):
*
𝐽 𝑤, 𝑏 = − / ∑/ .
.0* 𝑦 log 𝑓!,# 𝑥⃗
. + 1−𝑦 . log 1 − 𝑓!,# 𝑥⃗ .

• Gradient Decent: Compared with Linear Regression:

Repeat {
2
𝑤1 = 𝑤1 − 𝛼 𝐽 𝑤, 𝑏 ,
2!!
2 * /
where 𝐽 𝑤, 𝑏 = ∑ (𝑓 𝑥⃗ (.) − 𝑦 . )𝑥1 .
2!! / .0* !,#

2
𝑏 =𝑏−𝛼 𝐽 𝑤, 𝑏 ,
2#
2 * /
where 𝐽 𝑤, 𝑏 = ∑ (𝑓 𝑥⃗ (.) − 𝑦 . )
2# / .0* !,#
} simultaneous updates
Bias & Variance
• Bias and Variance are two fundamental concepts in machine learning that
pertain to the errors associated with predictive models.
• Bias: The differences between actual or expected values and the predicted
values are known as bias error or error due to bias. Bias is a systematic error
that occurs due to wrong assumptions in the machine learning process.
• Low Bias: In this case, the model will closely match the training dataset.
• High Bias: If a model has high bias, this means it can't capture the
patterns in the data, no matter how much you train it. The model is too
simplistic. This scenario is often referred to as underfitting.
• Variance: Variance is the amount by which the performance of a predictive
model changes when it is trained on different subsets of the training data.
More specifically, variance is the variability of the model that how much it is
sensitive to another subset of the training dataset (i.e. how much it can adjust
on the new subset of the training dataset).
• Low Variance: Low variance means that the model is less sensitive to
changes in the training data and can produce consistent estimates of the
target function with different subsets of data from the same distribution.
• High Variance: High variance means that the model is very sensitive to
changes in the training data and can result in significant changes in the
estimate of the target function when trained on different subsets of data 15
from the same distribution.
Polynomial Regression Examples

M=1 M=3 M=9

• Underfitting • Just right • Overfitting

• Does not fit the • Fits training set • Fit the training set
training set well pretty well extremely well
• Cannot fit the • Fits test set well • Cannot fit the test
test set as well • Generalization set as well
• High bias • High variance
Classification Examples

𝑧 = 𝑤"𝑥" + 𝑤#𝑥# + 𝑏 z = 𝑤" 𝑥" + 𝑤# 𝑥# + z = 𝑤" 𝑥"$ + 𝑤# 𝑥#$ +

𝑤$𝑥"# + 𝑤%𝑥## + 𝑤&𝑥"𝑥# + 𝑏 𝑤$𝑥"# + 𝑤%𝑥## + 𝑤&𝑥"𝑥# +
…+ 𝑏

• Underfitting • Just right • Overfitting

Dealing with Overfitting

• Collect more training examples

Dealing with Overfitting

• Select features to include/exclude:

• 100 features à 10 feature
• 100 features + insufficient data à Overfitting
• Just right 10 features + same data à Just right (possible)

• Disadvantage:
• Useful features could be lost
Regularization

• Reduce the size of parameters w

𝑓 𝑥 𝑓 𝑥
= 28𝑥 − 385𝑥 # + 39𝑥 $ = 13𝑥 − 0.23𝑥 # + 0.000014𝑥 $
− 174𝑥 % + 100 − 0.0001𝑥 % + 10
Regularized Linear Regression
• Overall Loss with Regularizer:

!
𝐽 𝑤, 𝑏 = − 9 ∑9
:;! = 𝑦 : log 𝑓
&,( 𝑥⃗
: + 1−𝑦 :

D
log 1 − 𝑓&,( 𝑥⃗ : >+ ∑FE;! 𝑤E -
-9
• Gradient Decent:
Repeat {
(
𝑤' = 𝑤' − 𝛼 ($ 𝐽 𝑤, 𝑏 ,
!
( ! -
where ($ 𝐽 𝑤, 𝑏 = ) ∑) ⃗ (#) − 𝑦 # )𝑥'
#*!( 𝑓$,& 𝑥
#
+ ) 𝑤'
!

(
𝑏 = 𝑏 − 𝛼 (& 𝐽 𝑤, 𝑏 ,
( !
where (& 𝐽 𝑤, 𝑏 = ) ∑) ⃗ (#) − 𝑦 # )
#*!( 𝑓$,& 𝑥
} simultaneous updates
Machine Learning Objective

• Find a model M:
• that fits the training data + that is simple

• Inductive hypothesis: Models that perform well on

training examples are expected to do well on test (unseen)
examples.
• Occam's Razor: Simpler models are expected to do better
than complex models on test examples (assuming similar
training performance).
Algebraic Interpretation

• The output of the neuron is a linear combination of inputs

from other neurons, rescaled by the weights.
• summation corresponds to combination of signals
• It is often transformed through an activation/output
function.
Binary Classification

• Predictions on test dataset:

• A perfect classifier
• Test dataset for evaluation:
• In binary classification dataset,
each instance will have its true
label (true class): Positive Class
(P) vs Negative Class (N).

• A real-world classifier

Images from https://classeval.wordpress.com/introduction/basic-evaluation-measures/

Confusion Matrix

• Confusion matrix (a 2x2 table) is composed of four outcomes of classification:

• True positive (TP): correct positive prediction
• False positive (FP): incorrect positive prediction
• True negative (TN): correct negative prediction
• False negative (FN): incorrect negative prediction

True Prediction Positive Negative

Positive # of TPs # of FNs
Negative # of FPs # of TNs
Basic Measurements
• Accuracy is calculated as the number of • Recall (sensitivity, true positive rate) is
all correct predictions divided by the total calculated as the number of correct
number of the dataset. positive predictions divided by the total
number of positives.

• F1 Score is a harmonic mean of precision

• Precision is calculated as the number of and recall.
correct positive predictions divided by the
total number of positive predictions.
Multi-class Classification

• Multi-class Classification: • Binary classification:

• To classify instances into one of
more than two classes. (i.e., there
are more than two possible
categories or labels)

• Strategies:
• Multi-class classification:
• One-vs-All (One-vs-Rest)
• One-vs-One
• Softmax Regression (Later)
• Decision Trees (Later)

Images from: https://utkuufuk.com/2018/06/03/one-vs-all-classification/

One-vs-All

• One-vs-all classification breaks down N classes present in the

dataset into N binary classifier models that aims to classify a data
point as either part of the current class or not.
• Suppose you have classes 1, 2, and 3.
• Model A: 1 or 2,3 (1 or not 1)
• Model B: 2 or 1,3 (2 or not 2)
• Model C: 3 or 1,2 (3 or not 3)

• At prediction time, the class that corresponds

to the classifier with the highest confidence
score is the predicted class.
• Model A: 𝑃(𝑥 = 1) and 𝑃(𝑥 ≠ 1)
• Model B: 𝑃(𝑥 = 2) and 𝑃(𝑥 ≠ 2)
• Model C: 𝑃(𝑥 = 3) and 𝑃(𝑥 ≠ 3)
• Among 𝑃(𝑥 = 1) , 𝑃(𝑥 = 2) , and
𝑃(𝑥 = 3) , which one is the highest?

Images from: https://www.cc.gatech.edu/classes/AY2016/cs4476_fall/results/proj4/html/jnanda3/index.html

One-vs-one

• One-vs-one classification breaks down N classes present in the

dataset into N*(N-1)/2 binary classifier models – one for each pair
of classes.

• Suppose you have classes 1, 2, and 3.

• Model A: 1 or 2
• Model B: 1 or 3
• Model C: 2 or 3

• At prediction time, each classifier votes for a

class, and the class with the most votes is the
predicted class.
• Model A: Vote for 1 or 2
• Model B: Vote for 1 or 3
• Model C: Vote for 2 or 3
• Classes 1, 2, and 3, which one has the
most votes?
Questions?

Deloitte
No ratings yet
Deloitte
40 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
383-Fall11-Lec19
No ratings yet
383-Fall11-Lec19
30 pages
Linear Models
No ratings yet
Linear Models
30 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
ML 01
No ratings yet
ML 01
24 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
CH 1
No ratings yet
CH 1
24 pages
Machine Learning – I[1]
No ratings yet
Machine Learning – I[1]
126 pages
ML
No ratings yet
ML
9 pages
Classification
100% (2)
Classification
105 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
ML Opt
No ratings yet
ML Opt
89 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
SML_Lecture1
No ratings yet
SML_Lecture1
37 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
week2
No ratings yet
week2
43 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
04-LogisticRegression
No ratings yet
04-LogisticRegression
29 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Machine Learning Exploring The Model
No ratings yet
Machine Learning Exploring The Model
17 pages
03 Ai
No ratings yet
03 Ai
59 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Basic Mathematics. Explained Easy | For Beginners
From Everand
Basic Mathematics. Explained Easy | For Beginners
ExaGrecation
No ratings yet
Digital-Policing-The-Ethical-Issues-Arising-from-Digital-Policing
No ratings yet
Digital-Policing-The-Ethical-Issues-Arising-from-Digital-Policing
16 pages
para Infografia
No ratings yet
para Infografia
234 pages
Data Science
No ratings yet
Data Science
11 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Malicious URL Detection Using Machine Learning: A Survey: Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi
No ratings yet
Malicious URL Detection Using Machine Learning: A Survey: Doyen Sahoo, Chenghao Liu, Steven C.H. Hoi
2 pages
Bits Pilani Wilp - Sem-2 Electives Course Details
100% (1)
Bits Pilani Wilp - Sem-2 Electives Course Details
11 pages
Machine Learning Curriculum Berkley
100% (1)
Machine Learning Curriculum Berkley
12 pages
Microcontrollers Nanoedgeai Solution Overview
No ratings yet
Microcontrollers Nanoedgeai Solution Overview
32 pages
AIML projectsynopsis Format 2024-25
No ratings yet
AIML projectsynopsis Format 2024-25
4 pages
Paper 8-Weather Prediction Using Linear Regression Model-Bnmit IITCEE ICCCI - Conference-1
No ratings yet
Paper 8-Weather Prediction Using Linear Regression Model-Bnmit IITCEE ICCCI - Conference-1
4 pages
Generative AI Market Report
No ratings yet
Generative AI Market Report
53 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
16 pages
Project Presentation Viva Question and Answers
No ratings yet
Project Presentation Viva Question and Answers
4 pages
Fairness in Machine Learning: A Survey
No ratings yet
Fairness in Machine Learning: A Survey
38 pages
BAED-AI2121-2322S-Performance Task 1 4th Quarter Grade 12
100% (2)
BAED-AI2121-2322S-Performance Task 1 4th Quarter Grade 12
4 pages
Hitanshu Parekh DSCI552 Midterm2
No ratings yet
Hitanshu Parekh DSCI552 Midterm2
15 pages
3 Coding Attention Mechanisms - Build A Large Language Model (From Scratch)
No ratings yet
3 Coding Attention Mechanisms - Build A Large Language Model (From Scratch)
64 pages
Analysis of Rate of Penetration (ROP) Prediction in Drilling Using Physics-Based and Data-Driven Models
No ratings yet
Analysis of Rate of Penetration (ROP) Prediction in Drilling Using Physics-Based and Data-Driven Models
30 pages
Pedestrian Detection - Kristina Pickl
No ratings yet
Pedestrian Detection - Kristina Pickl
45 pages
Detailed_Introduction_to_AI_Notes
No ratings yet
Detailed_Introduction_to_AI_Notes
4 pages
6632-Bootcamp in Credit Risk
No ratings yet
6632-Bootcamp in Credit Risk
167 pages
DWDM(UNIT-1)
No ratings yet
DWDM(UNIT-1)
29 pages
Chapter_2_3_Problem_and_Methodology_RL_Report_Kiran
No ratings yet
Chapter_2_3_Problem_and_Methodology_RL_Report_Kiran
3 pages
BCS602
No ratings yet
BCS602
2 pages
Eedom and Data Driven PDF
No ratings yet
Eedom and Data Driven PDF
156 pages
Ram
No ratings yet
Ram
29 pages
III B.Tech I Sem MachineLearning (20AD5T04)
No ratings yet
III B.Tech I Sem MachineLearning (20AD5T04)
1 page
Cuestionario Block Chain y Machin Learning
No ratings yet
Cuestionario Block Chain y Machin Learning
5 pages
Activation Functions and Convolutional Neural Networks
No ratings yet
Activation Functions and Convolutional Neural Networks
137 pages

3 LogisticRegression

Uploaded by

3 LogisticRegression

Uploaded by

ITCS 6156/8156 Fall 2023

Instructor: Hongfei Xue

Question Answer "y"

if 𝑓!,# 𝑥 < 0.5 → 𝑦( = 0

if 𝑓!,# 𝑥 < 0.5 → 𝑦( = 0

Probabilistic Discriminative Models: directly model the

Decision boundary is hyperplane 𝑓 𝑥 = 0.5 → 𝑧 = 0

• Squared Error Cost:

- Differentiable => can use gradient descent

• Logistic Loss Function:

𝑖𝑓 𝑦 # = 1, As 𝑓$,& 𝑥⃗ # → 1, then loss → 0

• Logistic Loss Function:

• Simplified Logistic Loss Function (Convex):

Can be derived from Maximum Likelihood

• Gradient Decent: Compared with Linear Regression:

M=1 M=3 M=9

• Underfitting • Just right • Overfitting

𝑧 = 𝑤"𝑥" + 𝑤#𝑥# + 𝑏 z = 𝑤" 𝑥" + 𝑤# 𝑥# + z = 𝑤" 𝑥"$ + 𝑤# 𝑥#$ +

• Underfitting • Just right • Overfitting

• Collect more training examples

• Select features to include/exclude:

• Reduce the size of parameters w

• Inductive hypothesis: Models that perform well on

• The output of the neuron is a linear combination of inputs

• Predictions on test dataset:

Images from https://classeval.wordpress.com/introduction/basic-evaluation-measures/

• Confusion matrix (a 2x2 table) is composed of four outcomes of classification:

True Prediction Positive Negative

• F1 Score is a harmonic mean of precision

• Multi-class Classification: • Binary classification:

Images from: https://utkuufuk.com/2018/06/03/one-vs-all-classification/

• One-vs-all classification breaks down N classes present in the

• At prediction time, the class that corresponds

Images from: https://www.cc.gatech.edu/classes/AY2016/cs4476_fall/results/proj4/html/jnanda3/index.html

• One-vs-one classification breaks down N classes present in the

• Suppose you have classes 1, 2, and 3.

• At prediction time, each classifier votes for a

You might also like