0% found this document useful (0 votes)
23 views

Logistic Regression

The document discusses using logistic regression for classification problems. Logistic regression predicts the probability of an outcome by fitting data to a logit function. It is used for problems with a binary dependent variable that is coded as 1 or 0. The document provides an example of using logistic regression to predict customer subscription based on age and discusses representing binary outcomes and categorical variables. It also addresses issues with using a linear model for a binary outcome and how logistic regression addresses these.

Uploaded by

Yash Pancholi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Logistic Regression

The document discusses using logistic regression for classification problems. Logistic regression predicts the probability of an outcome by fitting data to a logit function. It is used for problems with a binary dependent variable that is coded as 1 or 0. The document provides an example of using logistic regression to predict customer subscription based on age and discusses representing binary outcomes and categorical variables. It also addresses issues with using a linear model for a binary outcome and how logistic regression addresses these.

Uploaded by

Yash Pancholi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10-09-2023

Classification Problem – With the help of


Logistic Regression
• What types of problem can we answer using logistic regression?
• Whether customer will default on loan payment or not?

Logistic Regression •
Whether email/SMS is spam or not?
This year, an employee will promoted or not?
• A person has disease or not?
• Whether a person will purchase or not?
• Whether a user skip the add or watch till end?

Logistic Regression Logistic Regression


• In both the tables, Age and • Logistic regression is a statistical method for analysing a dataset in which
Gender are independent there are one or more independent variables that determine a binary
variable. outcome.
• In Table 1, we are trying to • Logistic Regression is a classification algorithm.
predict the revenue which is a • It predicts the probability of occurrence of an event by fitting data to a logit
continuous dependent variable. function.
• In Table 2, we are trying to • In logistic regression, the dependent variable is binary or dichotomous, i.e.
predict the probability of it only contains data coded as 1 (TRUE, Success, Pregnant, etc.) or 0 (FALSE,
subscription based on gender Failure, Non-Pregnant, etc.)
and age which is a binary • We use dummy variables to represent binary / categorical outcome, for
dependent variable. example we will use “1” for success in exam and “0” for failure in exam.

Continuous Vs. Categorical Variable


Representing the binary Outcome
• General linear regression model • The binary outcome have two potential values, usually an observation belongs to a certain category or has
satisfied some particular attribute going to be a yes/no question.
• 𝑦 = 𝛽0 + 𝛽1*𝑥1 + 𝛽2*𝑥2 + 𝜀 • Create a dummy variable to indicate an observation is Yes or No:
• If the answer is Yes dependent variable is 1
• Independent variable(x’s): • If the answer is No dependent variable is 0
• Continuous: Age, income, height -> Uses Numerical value • If we code our dependent variable the other way around ( Yes is “0” and No is “1”) the coefficients are going
to have the same magnitudes but the opposite signs.
• Categorical: gender, city, ethnicity -> Uses dummies for example: For
Male use “0” and for female “1” For example:
• In case of student admission, we are interested in finding out the probability of a student obtaining a place
• Dependent Variable (y): on a post-graduate course given the marks from undergraduate degree and the institution they attended.
• In the context of politics we could assess if this person will vote against or in favour of a particular law.
• Continuous: consumption, time spent -> Uses Numerical value
• In a retail context, a customer buy or did not buy a product.
• Categorical: Yes/No -> Uses dummies • An individual transactions, use a binary outcome to model if that particular transaction is legal or fraudulent.

1
10-09-2023

Let see an Example Let’s Apply Linear Model…???


• Netflix conducted a marketing activity on its 500 customers out of • For the above model we can also use the linear model. Only problem
which some customers subscribed the channel whereas some did not. we may face is that the dependent variable is binary instead of
Now, Netflix wants to analyse the success of their marketing continuous.
campaign. They have taken a sample of 20 customers and want to • If we want to use the linear model for this problem , then we need to
analyse the results. change the variable “No” to “0” and variable “Yes” to “1” and
• Subscribe: Indicates a customer has subscribed to a magazine. whenever customer changing from 0 to 1, it increases the likelihood
• Age(Continuous variable): Examine how age influences the likelihood of subscription.
of subscription • So, then we can run a simple linear model
• 𝑆𝑢𝑏𝑠𝑐𝑟𝑖𝑏𝑒 = 𝛽0 + 𝛽1∗𝑎𝑔𝑒 + 𝜀

Interpretation of Result Problems with the linear Approach


• If our dependent variable is binary, then we want to see what makes • The Probabilities are bounded between (0≤𝑝≤1)
it change from 0 to 1. • The range of age in our data is between 18 ≤ 𝑎𝑔𝑒 ≤ 62 so, the
• This can be interpreted as what increases the likelihood of youngest customer is 18 year old and the oldest customer is 62 year
subscription, or P(subscription = 1), which we can also simply denote old.
as p. • It only makes sense to develop a forecasts for observations similar to
• The result can be interpreted as: the ones we have in our data
• 𝑝(𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑏𝑒 = 1) = 𝑝 = -0.866 + 0.03 * age • Lets assume that the probability of a 40 year old person subscribe is:
• Every additional year of age increases the probability of subscription
by 3%. • What about people with 26 and 57 years of age?

• The one shown with those breaks in the function but this is two
engineered way to custom to be a standard approach Fixing the Prior Approach
• Could we do something better and let's think what should we do to
fix this again note that probabilities should be between 0 and 1
• We need to somehow constrain p such that 0≤𝑝≤1
• We know p = f(age), but the linear function didn’t work.
• What must f(.) satisfy to always produce reasonable forecasts?
• f(.) must satisfy two things:
• It must always be positive (since p ≥0)
• It must be less than 1 (since p ≤1)

2
10-09-2023

Two steps The linear thinking is not completely gone


• Lets try to develop a new function that will satisfy these two criteria
• It must always be positive (since p ≥0) • The previous expression (by doing some algebra) can be rewritten as:
• What functions could give you a positive numbers
• The absolute value of a number
• The squared version of number
• The alternative to this is an exponential form
• P being the result of the prior expression is equal to a linear function
• 𝑝 = exp(𝛽0+𝛽1∗𝑎𝑔𝑒) = 𝑒(𝛽0+𝛽1∗𝑎𝑔𝑒)
of age that looks just like the linear simple regression models.
• For example if (𝛽0+𝛽1∗𝑎𝑔𝑒) is -2, then exp(-2) = 0.136 (Use excel function
“exp” to find exponential value. • Even though the probability of the customer subscribing (p) is not
• It must be less than 1(since p ≤1) linear function of age, we can perform a simple transformation on it
such that it is now a linear function of age.
• The above equation is used in Logistic Regression.
• For example if exp(𝛽0+𝛽1∗𝑎𝑔𝑒) is 1.2 , to make it less than one , we can do :
1.2/(1.2+1) = 1.2/2.2 = 0.55

Logistic Regression in Excel Model building steps


• To solve a logistic regression is excel we need to use a real stat resource pack which is available in
below link. 1. Source data
• http://www.real-statistics.com/free-download/real-statistics-resource-pack/ 2. Data cleaning
• To use a real stat you first need to download a real stat resource pack as per excel version.
• First install excel solver add-in available in excel then install downloaded real stat resource pack. 3. Exploratory Data Analysis
4. Divide data – Train and Test
• Steps to install solver are mentioned below
• Before installation of Real Statistics Resource Pack install add in steps to install add in is 5. Check model specific assumptions
mentioned below.
• Open a blank Excel spreadsheet 6. Build model
• Click File -> Options -> add ins -> Excel add ins -> go. 7. Evaluate model
• A dialog box will open check solver add ins
• click ok

• Accuracy: Overall, how often is the classifier correct?


(TP+TN)/(N+P)
• Misclassification/Error Rate: Overall, how often is it wrong?
(FP+FN)/(N+P)
• True Positive Rate (TPR) / Sensitivity : When it's actually yes, how often does it
predict yes?
TP/P
• True Negative Rate (TNR) / Specificity: When it's actually no, how often does it
predict no?
TN/N
https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/

3
10-09-2023

Reading References
• https://medium.com/greyatom/logistic-regression-union-between-
regression-and-classification-8a71b77562ae
• https://classeval.wordpress.com/introduction/basic-evaluation-measures/
• https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
• https://towardsdatascience.com/decoding-the-confusion-matrix-
bb4801decbb
• https://www.hackerearth.com/practice/machine-learning/machine-
learning-algorithms/logistic-regression-analysis-r/tutorial/

• https://www.real-statistics.com/free-download/real-statistics-resource-
pack/ Excel add-in for logistic regression

You might also like