0% found this document useful (0 votes)

251 views5 pages

Basic Concepts of Logistic Regression

Logistic regression predicts the probability of an event occurring (like a "yes/no" outcome) based on independent variables. It transforms the probability using the logit function (the log of the odds) so the range of possible values is unrestricted between 0 and infinity. The logistic regression model is fitted to data using maximum likelihood estimation rather than ordinary least squares regression, since the assumptions of ordinary regression like linearity and equal variance are violated by binary outcome data.

Uploaded by

Econometrics Freelancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

251 views5 pages

Basic Concepts of Logistic Regression

Uploaded by

Econometrics Freelancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Basic Concepts of Logistic Regression

The basic approach is to use the following regression model, employing the notation from
Definition 3 of Method of Least Squares for Multiple Regression:

where the odds function is as given in the following definition.

Definition 1: Odds(E) is the odds that event E occurs, namely

Where p has a value 0 p 1 (i.e. p is a probability value), we can define the odds function as

Observation: For our purposes, the odds function has the advantage of transforming the
probability function, which has values from 0 to 1, into an equivalent function with values between
0 and . When we take the natural log of the odds function, we get a range of values from to
.
Definition 2: The logit function is the log of the odds function, namely logit(E) = ln Odds(E), or

Definition 3: Based on the logistic model as described above, we have

where = P(E). It now follows that (see Exponentials and Logs):

and so

Here we switch to the model based on the observed sample (and so the parameter is replaced by
its sample estimate p, the j coefficients are replaced by the sample estimates bj and the error term
is dropped). For our purposes we take the event E to be that the dependent variable y has value
1. If y takes only the values 0 or 1, we can think of E as success and the complement E of E as
failure. This is as for the trials in a binomial distribution.
Just as for the regression model studied in Regression and Multiple Regression, a sample consists
of n data elements of the form (yi, xi1, x ,, xik), but for logistic regression each yi only takes the
value 0 or 1. Now let Ei = the event that yi = 1 and pi = P(Ei). Just as the regression line studied
previously provides a way to predict the value of the dependent variable y from the values of the
independent variables x1, , xk in for logistic regression we have

Note too that since the yi have a proportion distribution, by Property 2 of Proportion
Distribution, var(yi) = pi (1 pi).
Observation: In the case where k = 1, we have

Such a curve has sigmoid shape:

Figure 1 Sigmoid curve for p

The values of b0 and b1 determine the location direction and spread of the curve. The curve is
symmetric about the point where x = -b0 / b1. In fact, the value of p is 0.5 for this value of x.

Observation: Logistic regression is used instead of ordinary multiple regression because the
assumptions required for ordinary regression are not met. In particular
1. The assumption of the linear regression model that the values of y are normally distributed
cannot be met since y only takes the values 0 and 1.
2. The assumption of the linear regression model that the variance of y is constant across
values of x (homogeneity of variances) also cannot be met with a binary variable. Since the
variance is p(1p) when 50 percent of the sample consists of 1s, the variance is .25, its
maximum value. As we move to more extreme values, the variance decreases. When p =
.10 or .90, the variance is (.1)(.9) = .09, and so as p approaches 1 or 0, the variance
approaches 0.
3. Using the linear regression model, the predicted values will become greater than one and
less than zero if you move far enough on the x-axis. Such values are theoretically
inadmissible for probabilities.
For the logistics model, the least squares approach to calculating the values of the
coefficients bi cannot be used; instead the maximum likelihood techniques, as described below,
are employed to find these values.
Definition 4: The odds ratio between two data elements in the sample is defined as follows:

Using the notation px = P(x), the log odds ratio of the estimates is defined as

Observation: In the case where

Thus,

Furthermore, for any value of d

Note too that when x is a dichotomous variable,

E.g. when x = 0 for male and x = 1 for female, then represents the odds ratio between males and
females. If for example b1 = 2, and we are measuring the probability of getting cancer under certain
conditions, then = 7.4, which would mean that the odds of females getting cancer would be 7.4
times greater than males under the same conditions.
Observation: The model we will use is based on the binomial distribution, namely the probability
that the sample data occurs as it does is given by

Taking the natural log of both sides and simplifying we get the following definition.
Definition 5: The log-likelihood statistic is defined as follows:

where the yi are the observed values while the pi are the corresponding theoretical values.
Observation: Our objective is to find the maximum value of LL assuming that the pi are as in
Definition 3. This will enable us to find the values of the bi coordinates. It might be helpful to
review Maximum Likelihood Function to better understand the rest of this topic.
Example 1: A sample of 760 people who received doses of radiation between 0 and 1000 rems
was made following a recent nuclear accident. Of these 302 died as shown in the table in Figure
2. Actually each row in the table represents the midpoint of an interval of 100 rems (i.e. 0-100,
100-200, etc.).

Figure 2 Data for Example 1 plus probability and odds

Let Ei = the event that a person in the ith interval survived. The table also shows the probability
P(Ei) and odds Odds(Ei) of survival for a person in each interval. Note that P(Ei) = the percentage
of people in interval i who survived and

In Figure 3 we plot the values of P(Ei) vs. i and ln Odds(Ei) vs. i. We see that the second of these
plots is reasonably linear.

Figure 3 Plot of probability and ln odds

Given that there is only one independent variable (namely x = # of rems), we can use the following
model

Here we use coefficients a and b instead of b0 and b1 just to keep the notation simple.
We show two different methods for finding the values of the coefficients a and b. The first uses
Excels Solver tool and the second uses Newtons method. Before proceeding it might be
worthwhile to click on Goal Seeking and Solver to review how to use Excels Solver tool and
Newtons Method to review how to apply Newtons Method. We will use both methods to
maximize the value of the log-likelihood statistic as defined in Definition 5.
Sample Size: The recommended minimum sample size for logistic regression is given by 10k/q
where k = the number of independent variables and q = the smaller of the percentage of cases with
y = 0 or y = 1, with a minimum of 100.
For Example 1, k = 1 and q = 302/760 = .397, and so 10k/q = 25.17. Thus a minimum sample of
size 100 is recommended.

Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
Logistic Regression Example Illustrated
No ratings yet
Logistic Regression Example Illustrated
20 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
Log Reg
No ratings yet
Log Reg
32 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Chapter 16 - Logistic Regression Model
No ratings yet
Chapter 16 - Logistic Regression Model
7 pages
Logistic Regression
100% (3)
Logistic Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
CH15
No ratings yet
CH15
19 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
ES714glm Generalized Linear Models
No ratings yet
ES714glm Generalized Linear Models
26 pages
Article: An Introduction Tos Logistic Regression Analysis and Reporting
No ratings yet
Article: An Introduction Tos Logistic Regression Analysis and Reporting
5 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic
No ratings yet
Logistic
14 pages
Logistic Regression: Multivariate Analysis
No ratings yet
Logistic Regression: Multivariate Analysis
29 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
100% (1)
Logistic Regression
12 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Chapter Three 3.0 Methodology 3.1 Source of Data
No ratings yet
Chapter Three 3.0 Methodology 3.1 Source of Data
10 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
302 F 14 Logistic Regression
No ratings yet
302 F 14 Logistic Regression
23 pages
An Introduction To Logistic Regression in R
No ratings yet
An Introduction To Logistic Regression in R
25 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Lecture 7 Logistic Regression
No ratings yet
Lecture 7 Logistic Regression
33 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
3 Logistic Regression
No ratings yet
3 Logistic Regression
21 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
19 pages
Logistic Regression
No ratings yet
Logistic Regression
15 pages
Lecture13 PDF
No ratings yet
Lecture13 PDF
48 pages
FAQ - How Do I Interpret Odds Ratios in Logistic Regression
No ratings yet
FAQ - How Do I Interpret Odds Ratios in Logistic Regression
6 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Katz Diagram
No ratings yet
Katz Diagram
16 pages
Question Text: Complete Mark 1.00 Out of 1.00
No ratings yet
Question Text: Complete Mark 1.00 Out of 1.00
10 pages
Logistic Regression: Classification
No ratings yet
Logistic Regression: Classification
28 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
6 pages
Sample Quiz1 Questions
No ratings yet
Sample Quiz1 Questions
8 pages
School of Post Graduate Studies: Addis Ababa University
No ratings yet
School of Post Graduate Studies: Addis Ababa University
3 pages
Seminar Report Bhavesh
No ratings yet
Seminar Report Bhavesh
25 pages
No Name - Heart Rate Predictor Heart Failure Journal Reading Kardio Ugm
No ratings yet
No Name - Heart Rate Predictor Heart Failure Journal Reading Kardio Ugm
23 pages
CDA Exercises
No ratings yet
CDA Exercises
26 pages
Cobas c111
No ratings yet
Cobas c111
38 pages
Applied AI Course
No ratings yet
Applied AI Course
26 pages
150 Essential Data Science Questions and Answers
No ratings yet
150 Essential Data Science Questions and Answers
55 pages
8 Machine Learning Algorithms in Python
100% (3)
8 Machine Learning Algorithms in Python
16 pages
CustomerChurn Assignment
100% (3)
CustomerChurn Assignment
15 pages
Data Science Interview QnAs by CloudyML
No ratings yet
Data Science Interview QnAs by CloudyML
21 pages
Quiz Questions and Answers For 2024
No ratings yet
Quiz Questions and Answers For 2024
14 pages
Performance of Latex and Nonlatex Medical Examination Gloves During Simulated Use
No ratings yet
Performance of Latex and Nonlatex Medical Examination Gloves During Simulated Use
6 pages
Warner 等 - 1988 - Stock Prices and Top Management Changes
No ratings yet
Warner 等 - 1988 - Stock Prices and Top Management Changes
32 pages
Ordinal and Multinomial Models
100% (1)
Ordinal and Multinomial Models
58 pages
Combination Strategy of Dynamic Variable Speed Limit Method Based On Real-Time Crash Prediction Model For Highway
No ratings yet
Combination Strategy of Dynamic Variable Speed Limit Method Based On Real-Time Crash Prediction Model For Highway
7 pages
Joint Engagement Is A Potential Mechanism Leading To Increased Initiations of Joint Attention and Downstream Effects On Language: JASPER Early Intervention For Children With ASD
No ratings yet
Joint Engagement Is A Potential Mechanism Leading To Increased Initiations of Joint Attention and Downstream Effects On Language: JASPER Early Intervention For Children With ASD
8 pages
1 s2.0 S1042443123000690 Main
No ratings yet
1 s2.0 S1042443123000690 Main
17 pages
Factors Influencing Retail Investors' Trading Behaviour in The Thai Stock Market
No ratings yet
Factors Influencing Retail Investors' Trading Behaviour in The Thai Stock Market
14 pages
1 s2.0 S222541101630181X Main
No ratings yet
1 s2.0 S222541101630181X Main
5 pages
Single-Sex and Co-Educational Secondary Schooling: What Are The Social and Family Outcomes, in The Short and Longer Term?
No ratings yet
Single-Sex and Co-Educational Secondary Schooling: What Are The Social and Family Outcomes, in The Short and Longer Term?
21 pages
An Introduction To Generalized Linear Models, Second Edition - 2nd Edition Readable Ebook Download
100% (9)
An Introduction To Generalized Linear Models, Second Edition - 2nd Edition Readable Ebook Download
14 pages
Predicting Construction Labor Productivity Based On Implementation Levels of Human Resource Management Practices
No ratings yet
Predicting Construction Labor Productivity Based On Implementation Levels of Human Resource Management Practices
13 pages
Syllabus 3rd Year AIDS V & VI Sem.
No ratings yet
Syllabus 3rd Year AIDS V & VI Sem.
31 pages
Classification Algorithms II
No ratings yet
Classification Algorithms II
9 pages

Basic Concepts of Logistic Regression

Uploaded by

Basic Concepts of Logistic Regression

Uploaded by

Basic Concepts of Logistic Regression

where the odds function is as given in the following definition.

Definition 3: Based on the logistic model as described above, we have

where = P(E). It now follows that (see Exponentials and Logs):

Such a curve has sigmoid shape:

Figure 1 Sigmoid curve for p

Observation: In the case where

Furthermore, for any value of d

Note too that when x is a dichotomous variable,

Figure 2 Data for Example 1 plus probability and odds

Figure 3 Plot of probability and ln odds

You might also like