0% found this document useful (0 votes)
5 views

Logistic Regression

Uploaded by

akrab.tech7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Logistic Regression

Uploaded by

akrab.tech7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CSCI417

Machine Learning

Lecture # 6
Spring 2024

1
Tentative Course Topics

1.Machine Learning Basics


2.Classifying with k-Nearest Neighbors
3.Splitting datasets one feature at a time: decision trees
4.Classifying with probability theory: naïve Bayes
5.Linear/Logistic regression
6.Support vector machines
7.Model Evaluation and Improvement: Cross-validation, Grid Search, Evaluation Metrics, and
Scoring
8.Ensemble learning and improving classification with the AdaBoost meta-algorithm.
9.Introduction to Neural Networks - Building NN for classification (binary/multiclass)
10.Convolutional Neural Network (CNN)
11.Pretrained models (VGG, Alexnet,..)
12.Machine learning pipeline and use cases.

2
Agenda
Classification Problem:
Logistic Regression

Regularization
Regularization for Linear Regression

Regularization
Regularization for Logistic Regression

3
Classification Problem:

Logistic Regression

4
Classification
- Discrete outcomes.
- Binary , 0 negative , 1 positive
(normal / abnormal)
- Multi-class: a telescope that identifies
whether an object in the night sky is a
galaxy, star, or planet.

5
Hypothesis Representation
Hypothesis Representation
• Logistic regression model

Sigmoid Function
Logistic Function

𝑔 𝜃 𝑥

0.5

𝜃 𝑥

•We want our classifier to output values between 0 and 1


• When using linear regression we did hθ(x) = (θT x)
• For classification hypothesis representation we do hθ(x) = g((θT x))
7
Interpretation of hypothesis Output
• 𝜃 will give us the probability that our output is 1.
For example,
𝜃 gives us a probability of 70% that our output is 1.

• Our probability that our prediction is is just the complement of our probability that it is 1
For example,
if probability that it is 1 is , then
the probability that it is is .

8
Interpretation of hypothesis Output
• 𝜃 will give us the probability that our output is 1.
For example,

= , ( )=0.7…
tumourSize

 70% chance of a tumor being malignant.

• = 1-
Probability that , given , parameterized by

9
Binary logistic regression
• We have a set of feature vectors X with corresponding binary outputs

• We want to model p(ylx)

• By definition
• We want to transform the probability to remove the range restrictions, so can
take any real value.

10
Using ODDS
• We have a set of feature vectors X with corresponding binary outputs

• We want to model p(ylx)

• By definition
• We want to transform the probability to remove the range restrictions, as can
take any real value.

11
Hypothesis function (proof)
• We have a set of feature vectors X with corresponding binary outputs

• We want to model p(ylx)

• By definition
• We want to transform the probability to remove the range restrictions, as can
take any real value.

12
Hypothesis function

14
Maximum Likelihood Estimation (MLE)

15
Gradient Descent for Logistic Regression

16
Multiclass Classification (one-vs-all)

𝟏
𝒉𝜽 (𝒙)
(𝒊)
𝜽

Pick the class 𝑖 that maximize Not class 1


(𝒊)
𝜽

Suppose you have a multi-class classification problem with 𝑘 classes


(so 𝑦 ∈ {1,2 ⋯ , 𝑘}). Using the one-vs.-all method, how many different
logistic regression classifiers will you end up training?
23
The Problem of Overfitting
• Underfitting, or high bias, is when the
form of our hypothesis function
maps poorly to the trend of the data.
It is usually caused by a function that is
too simple or uses too few features.
• At the other extreme, overfitting, or
high variance, is caused by a
hypothesis function that fits the
available data but does not generalize
well to predict new data. It is usually
caused by a complicated function that
creates a lot of unnecessary curves and
angles unrelated to the data.

24
Addressing overfitting
• There are two main options to address the issue of overfitting:
1) Reduce the number of features:
– Manually select which features to keep.
– Use a model selection algorithm.
2) Regularization
– Keep all the features, but reduce the magnitude of parameters 𝜃 .
– Regularization works well when we have a lot of slightly useful features.

25
Regularization (To avoid overfitting)
Regularization for Linear Regression

26
Regularization Intuition

Price
Price

Size of house Size of house

• Suppose we penalize and make , really small.


• Small values for parameters
- simpler hypothesis
- less prone to overfitting
Regularization for linear Regression
• Simpler hypothesis  small values of 𝟏, 𝟐, … 𝒏
𝒎 𝒏
(𝒊) (𝒊) 𝟐 𝟐
𝜽 𝒋 𝒄𝒉𝒐𝒐𝒔𝒆 𝒔𝒎𝒂𝒍𝒍 𝜽
𝒊 𝟏 𝒋 𝟏 𝒕𝒐 𝒎𝒊𝒏 𝑱(𝜽)

• The λ, or lambda, is the regularization parameter.

It determines how much the costs of our theta parameters are inflated.
• Example: we have 2 sets of parameters =[1.35 3.5] and =[45.2 75.6]
– If λ is chosen to be 0, the cost function act as usual with no penalty on
 choose the large ones =[45.2 75.6]
– If λ is chosen to be large, small values of are chosen instead of large one.
 choose the large ones =[1.35 3.5]

28
Regularization for linear Regression
𝟏 𝒎 (𝒊) (𝒊) 𝟐 𝒏 𝟐
𝟐𝒎 𝒊 𝟏 𝜽 𝒋 𝟏 𝒋

• What if λ is set to an extremely large value (perhaps too large for our problem,
say λ= 1010)?
 it may smooth out the function too much and cause underfitting. Why?

“underfit” , ,
Price

 𝜽 𝟎
Size of house

X X X X
29
Regularization for linear Regression
• Gradient descent

<1
Regularization for Logistic Regression

x2

x1
Cost function:
𝒎 𝒏
(𝒊) (𝒊) (𝒊) (𝒊) 𝟐
𝜽 𝜽 𝒋
𝒊 𝟏 𝒋 𝟏
Small values of parameters

You might also like