0% found this document useful (0 votes)
4 views

Logistic Regression

The document discusses logistic regression as a statistical method for binary classification, emphasizing its use of the logistic function to predict probabilities of outcomes. It explains concepts such as log odds, odds ratios, and the assumptions necessary for logistic regression, alongside the distinction between binary and multinomial logistic regression for multi-class classification. Additionally, it covers evaluation metrics for regression and classification models, highlighting the importance of generalization to unknown examples.

Uploaded by

bharat.goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Logistic Regression

The document discusses logistic regression as a statistical method for binary classification, emphasizing its use of the logistic function to predict probabilities of outcomes. It explains concepts such as log odds, odds ratios, and the assumptions necessary for logistic regression, alongside the distinction between binary and multinomial logistic regression for multi-class classification. Additionally, it covers evaluation metrics for regression and classification models, highlighting the importance of generalization to unknown examples.

Uploaded by

bharat.goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Logistic Regression

Minati Rath
Classification
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent / Genuine?
Tumor: Malignant / Benign ?

0: “Negative Class” (e.g., benign tumor)


1: “Positive Class” (e.g., malignant tumor)

(Yes) 1

Malignant ?

(No) 0
Tumor Size

Can we solve the problem using linear regression?


Classification
(Yes) 1

Malignant ?

(No) 0

Tumor Size
Can we solve the problem using linear regression? E.g., fit a straight line
and define a threshold at 0.5 Threshold classifier output at 0.5:

If , predict “y = 1”
If , predict “y = 0”
Classification
(Yes) 1

Malignant ?

(No) 0

Tumor Size
Can we solve the problem using linear regression? E.g., fit a
straight line and define a threshold at 0.5
Threshold classifier output at 0.5: Failure due to adding
If , predict “y = 1” a new point

If , predict “y = 0”
Another drawback of using linear regression for this problem
Classification: y = 0 or 1
can be > 1 or < 0

What we need: Logistic Regression:


Logistic Regression Model

Sigmoid function is used as Logistic function


= estimated probability that y = 1 on input x
Interpretation of Hypothesis Output
Estimated probability that y = 1 on input x
“probability that y = 1, given x, parameterized by ”
Example: If
Tell patient that 70% chance of tumor being malignant
Logistic regression is a statistical method used for binary
classification problems. It predicts the probability of an outcome
belonging to a particular class using a logistic function. The technique
is based on the linear relationship between input variables and the log
odds of a categorical response variable. Logistic regression is a
popular method due to its simplicity, interpretability, and ease of
implementation
Log Odds and Log Odds
Log odds is a concept used in logistic regression to express the
relationship between the probability of an event happening and the
odds of that event happening.
Odds: The odds of an event are the ratio of
the probability that the event will happen to the probability that it will
not happen.
if the probability of an event happening is 0.75, odds =
means the event is three times as likely to happen as not happen.
Log Odds (also called the "logit"): The log odds is simply the natural
logarithm (log base eee) of the odds:

The log odds transform the range of probabilities (which are between 0
and 1) into a range between −∞ and +∞ , making it easier to model
probabilities in a regression context.
Why Use Log Odds in Logistic Regression?
In logistic regression, the relationship between the predictor
variables (such as age, income, etc.) and the binary outcome
(like yes/no or 0/1) is non-linear. To make it linear and fit into
the regression framework, the model predicts the log odds of
the outcome. The coefficients from a logistic regression model
are associated with the log odds, meaning each coefficient
represents how a one-unit change in a predictor variable affects
the log odds of the outcome.
Example:
If you have a logistic regression model predicting whether a
person buys a product based on their income, the coefficient for
income shows how a one-unit increase in income affects the log
odds of purchasing the product.
Odds Ratio
The odds ratio (OR) is a measure of association between a particular
predictor variable and an outcome, commonly used in logistic
regression. It tells us how much the odds of the outcome change with
a one-unit increase in the predictor variable.
The odds ratio compares the odds of the outcome for different levels
of the predictor variable. Specifically, it is the ratio of the odds of the
outcome occurring in one group to the odds of it occurring in another
group (often a baseline or reference group).
Interpretation of Odds Ratio:
OR = 1: The odds of the outcome are the same for both groups,
meaning the predictor has no effect on the outcome.
OR > 1: The odds of the outcome increase as the predictor increases.
For instance, an odds ratio of 2 means the odds of the outcome are
twice as high in group 1 compared to group 2.
OR < 1: The odds of the outcome decrease as the predictor increases.
For example, an odds ratio of 0.5 means the odds of the outcome are
half as high in group 1 compared to group 2.
Assumptions of Logistic Regression
Logistic regression has several assumptions:
1) The dependent variable must be binary;
2) The observations are independent;
3) There is little to no multicollinearity among the predictors;
4) The independent variables are linearly related to the log odds
of the outcome;
5) The model assumes a large sample size for reliable results.
Decision Boundary
Separating two classes of points.
We are attempting to separate two given sets / classes of points
Separate two regions of the feature space
Concept of Decision Boundary
Finding a good decision boundary => learn appropriate values
for the parameters 𝛩
x2
3
2

1 2 3 x1
Predict if

How to get the parameter values – will be discussed soon


Non-linear decision boundaries
x2

We can learn more complex decision


-1 1 x1 boundaries where the hypothesis function
-1
contains higher order terms.

-1 1 x1
-1
Multi-class classification one vs. all
Multiclass classification
News article tagging: Politics, Sports, Movies,
Religion, …
Medical diagnosis: Not ill, Cold, Flu, Fever

Weather: Sunny, Cloudy, Rain, Snow

How to use the estimated probability?

• Refraining from classifying unless confident


• Ranking items
• Multi-class classification
Multi-class classification

Binary Multi-class classification:


classification:

x2 x2

x1 x1
Multi-class classification
One-vs-all (one- x2

vs-rest):
x1
x2 x2

x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Multinomial/ Multi-class Logistic Regression
Multinomial logistic regression is an extension of binary logistic
regression used when the dependent variable has more than two
categories. The method models the probability of each category
separately, using one of the categories as a reference. It is
commonly applied in scenarios with more than two possible
outcomes, such as predicting the type of disease from symptoms.

One-vs-all
Train a logistic regression classifier for each class to
predict the probability that On a new input , to make a
prediction, pick the class that maximizes
How to evaluate a model?

• Regression
– Some measure of how close are predicted values
(by a model) to the actual values

• Classification
– Whether predicted classes match the actual
classes
Evaluation metrics for Regression

• Mean Squared Error (MSE)


– For every data point, compute error (distance between
predicted value and actual value)
– Sum squares of these errors, and take average
– More popular variant: RMSE (square root of MSE)
• R2 or R-squared
– A naïve Simple Average Model (SAM): for every point,
predict the average of all points
– R2: 1 – (error of model / error of SAM)
– Best possible R2 is 1; can be negative for a really bad model
R2 or R-squared
• Dataset has n instances <xi , yi>, i=1..N
• Predicted values: fi, i=1..N
• Mean of actual values:

Residual sum of squares

Total sum of squares


(proportional to variance)
Example: Fingerprint verification
• Input fingerprint, classify as y
known identity or intruder

• Application 1: Supermarket
verifies customers for giving a
discount
y
• Application 2: For entering into
RAW, GoI
On what data to measure precision,
recall, error rate, ..?
Option 1: training set
Option 2: some other set of examples that was unknown
at the time of training (test set)

Motivation for ML: learn a model that performs well


(generalizes well) to unknown examples
Option 2 gives better guarantees for
generalization of a learnt model

You might also like