0% found this document useful (0 votes)

27 views

Lect7 Math231

The document discusses logistic regression, which models the probability of binary outcomes as a function of predictor variables. It provides background on logistic regression and how it addresses limitations of linear regression for binary outcomes. An example analyzes the relationship between age and coronary heart disease using logistic regression. The results show that the odds of coronary heart disease increase by 11.6% for each additional year of age.

Uploaded by

Qasim Rafi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Lect7 Math231

Uploaded by

Qasim Rafi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

1/29

Statistics

Logistic Regression

Shaheena Bashir

FALL, 2019
2/29
Outline

Background

Introduction
Logit Transformation
Assumptions

Estimation

Example
Analysis
How Good is the Fitted Model?

Single Categorical Predictor

Types of Logistic Regression Models

o
3/29
Background

Motivating Example

o
4/29
Background

Scatter Plot
Relationship between Age & CHD

1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.8
Coronary heart disease

0.6
0.4
0.2
0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

20 30 40 50 60 70

Age (years)

o
Not informative!!
5/29
Background

Regression Model: Objective

I Describe the relationship between an outcome (dependent or

response) variable and a set of independent (predictor or
explanatory) variables by some regression model (equation).
I Predict some future outcome based on the regression model
How to model the relationship of CHD with age?

o
6/29
Background

Background

I What distinguishes a logistic regression model from the linear

regression model is that the outcome variable is binary (or
dichotomous).
I Whether the tumor is malignant (Yes=1) or not (No=0)
I Whether a newborn baby with low birth weight (yes=1) or not
(No=0)
I A student gets admission at LUMS (Yes=1) vs not (No=0)
For categorical response variable, the assumption that the
errors follow a normal distribution fails.

o
7/29
Background

Tabular Form of CHD Data

Age Group n CHD Present % CHD Present

20-29 10 1 0.10
30-34 15 2 0.13
35-39 12 3 0.25
40-44 15 5 0.33
45-49 13 6 0.46
50-54 8 5 0.63
55-59 17 13 0.76
60-69 10 8 0.80
100 43

o
8/29
Background

Proportion of Individuals with CHD

Relationship between Age & CHD

1.0
0.8

●
●

●
0.6
% with CHD

●
0.4

●
0.2

●
●
0.0

20 30 40 50 60 70

Age (years)

o
9/29
Introduction

Logistic Regression Model

I The response variable in logistic regression is categorical. The

linear regression model, i.e., Y = X β + does not work well
for a few reasons.
I The response values, 0 and 1, are arbitrary, so modeling the
actual values of Y is not exactly of interest.
I Our interest is in modeling the probability of each individual in
the population who responds with 0 or 1,
I The error terms in this case do not follow a normal distribution.
Thus, we might consider modeling P, the probability, as the
response variable.

o
10/29
Introduction

Sigmoid Function

Modeling the probability as response, some problems

I Although the general increase in probability is accompanied by
a general increase in age, we know that P, like all
probabilities, can only fall within the boundaries of 0 and 1.
I It is better to assume that the relationship between age and P
is sigmoidal (S-shaped), rather than a straight line.
I It is possible, however, to find a linear relationship between
age and a function of P. Although a number of functions
work, one of the most useful is the logit function.

o
11/29
Introduction
Logit Transformation

Logit Function
p
The logit function ln 1−p (also called log-odds) is simply the log of
ratio of P(Y = 1) divided by P(Y = 0).
p
ln = Xβ
1−p
The odds
p
= exp(X β).
(1 − p)
Solving

exp(y ) 1
p = Pr (Y = 1|X = x) = =
[1 + exp(y )] 1 + exp(−y )
gives the standard logistic function, while y = X β.
o
12/29
Introduction
Logit Transformation

Logit Function

p
g (x) = ln 1−p has many of the desirable properties of a linear
regression model.
I It may be continuous
I It is linear in the parameters
I It has the potential for a range between −∞ and +∞
depending on the range of x .

o
13/29
Introduction
Logit Transformation

Summary: Logit Transformation

Quantity Formula min max

Probability p 0 1
p
Odds 1−p 0 ∞
p
Logit or ’Log-Odds’ loge 1−p −∞ ∞

Logit stretches the probability scale

o
14/29
Introduction
Assumptions

Assumptions

Linear Regression Logistic Regression

∼ N(0; σ 2 ) ∼ Bin(p)
p
Y = Xβ + ln 1−p = Xβ +
Y |X ∼ N(X β; σ 2 ) Y |X ∼ Bin(p)

o
15/29
Estimation

Estimation of Parameters of Regression Model: β

I The method of maximum likelihood yields values for the

unknown parameters that maximize the probability of
obtaining the observed set of data.
I For logistic regression the likelihood equations are non-linear
in the parameters β’s and require special methods for their
solution.
I These methods are iterative in nature and have been
programmed into available logistic regression software

o
16/29
Example

Example: CHD Data

I Is age a risk factor of CHD? How the probability of CHD

changes by age?
I Outcome variable: CHD (Yes, No)
I Predictor: Age (in years)
Logistic regression models the probability of some event occurring
as a linear function of a set of predictors.

o
17/29
Example
Analysis

CHD Analysis

ln 1−p̂ p̂ = −5.31 + 0.11Age

I The coefficient is interpreted as the MARGINAL increase in
the log odds of CHD when age increases by 1 year.

Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.31 1.13 -4.68 0.00
age 0.11 0.02 4.61 0.00

OR = exp(0.11) = 1.116
The odds of getting CHD are · · · · · · when age increases by 1 year

o
18/29
Example
Analysis

Fitted Values

exp(βo + β1 X )
p =
[1 + exp(βo + β1 X )]
exp(−5.31 + 0.11Age)
=
[1 + exp(−5.31 + 0.11Age)]

o
19/29
Example
Analysis

R Software

mod1<-glm(chd ∼ age, family=’binomial’, data=chdage)

summary(mod1)
predict(mod1, type = ’response’)
anova(mod1, test=’Chisq’)
plot(mod1)

o
20/29
Example
Analysis

Predicted Probabilities

●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
predicted probabilities

●
0.6

●
●
●
●
●
●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●●
●●
●

20 30 40 50 60 70

Age
o
21/29
Example
How Good is the Fitted Model?

Analysis of Deviance
Model: binomial, link: logit
Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev Pr (> Chi)

NULL 99 136.66
Age 1 29.31 98 107.35 6.168e − 08 ∗ ∗∗

I Deviance is a measure of goodness of fit of a generalized

linear model. Or rather, it’s a measure of badness of fit.
I If our new model explains the data better than the null model,
there should be a significant reduction in the deviance which
can be tested against the chi-square distribution to give a
p-value
o
22/29
Example
How Good is the Fitted Model?

Hosmer-Lemeshow Goodness of Fit

How well our model fits depends on the difference between the
model and the observed data.

library(ResourceSelection)
hoslem.test(as.numeric(chdage$chd)-1, fitted(mod1))
R Output
Hosmer and Lemeshow goodness of fit (GOF) test
data: as.numeric(chdage$chd) - 1, fitted(mod1)
X-squared = 2.2243, df = 8, p-value = 0.9734

Our model appears to fit well because we have no significant

difference between the model and the observed data (i.e. the
p-value > 0.05).
o
23/29
Example
How Good is the Fitted Model?

o
24/29
Single Categorical Predictor

Simple Logistic Regression Model with a Categorical

Predictor

I How some function of the probability of categorical response

is linearly related to a predictor
I Interpretation of the resulting intercept βo & the slope β1
where predictor variable is also binary.

o
25/29
Single Categorical Predictor

Case-Control Study: A Recap Example

Past exposure CHD Cases Controls (without disease)

Smokers 112 176
Non-smokers 88 224
Totals 200 400

Odds of CHD for Smokers = · · ·

Odds of CHD for Non-Smokers = · · ·

o
26/29
Single Categorical Predictor

Case-Control Study: A Recap Example Cont’d

Let yi is binary response variable

I yi = 1; if CHD=yes
I yi = 0; if CHD=no

Past exposure yi ni
Smokers 112 288
Non-smokers 88 312

Then yi ∼ Bin(ni , pi )
xi is the binary predictor of past smoking
I xi = 1; if past smoker
I xi = 0; if non-smoker in the past

o
27/29
Single Categorical Predictor

Case-Control Study: A Recap Example Cont’d

The probability of CHD pi can be modeled as:

logit(pi ) = βo + β1 xi

I xi = 1, then logit(pi |xi = 1) = βo + β1 (1)

I xi = 0, then logit(pi |xi = 0) = βo

pi |xi = 1
β1 = logit(pi |xi = 1) − logit(pi |xi = 0) = log
pi |xi = 0
∴ OR = · · · · · ·

o
28/29
Single Categorical Predictor

Example: Logistic Regression

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.93 0.13 -7.43 0.00
pastsmoke1 0.48 0.17 2.76 0.01

I For past smokers, xi = 1 then

ln(odds of CHD) = βo + β1 ∴ Odds for smokers = · · ·
I For past non-smokers, xi = 0 then
ln(odds of CHD) = βo ∴ Odds for non-smokers = · · ·
OR = · · ·

o
29/29
Types of Logistic Regression Models

Types of Logistic Regression Model

I Binary Logistic Regression Model: The categorical

response is dichotomous (has only two 2 possible outcomes),
e.g., an email is a Spam or Not
I Multinomial Logistic Regression Model: Three or more
categories without ordering (polytomous response), e.g.,
Predicting food choices (Veg, Non-Veg, Vegan)
I Ordinal Logistic Regression Model: Three or more
categories with ordering, e.g., Movie rating from 1 to 5,
teaching evaluation by students, etc.

Coursera Basic Statistics Final Exam Answers
80% (5)
Coursera Basic Statistics Final Exam Answers
9 pages
Math IA
No ratings yet
Math IA
13 pages
Chapter 8 Logistic Regression (Compatibility Mode)
No ratings yet
Chapter 8 Logistic Regression (Compatibility Mode)
22 pages
SDQ Excel Tool
No ratings yet
SDQ Excel Tool
2 pages
10-11
No ratings yet
10-11
50 pages
18Logistic regression yilma
No ratings yet
18Logistic regression yilma
88 pages
Lecture 10 Logistic Regression Part 1
No ratings yet
Lecture 10 Logistic Regression Part 1
19 pages
05-11-2016 Paper Jan Magnus
No ratings yet
05-11-2016 Paper Jan Magnus
34 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Lecture 03-The HJM Framework
No ratings yet
Lecture 03-The HJM Framework
30 pages
Hypothesis Testing Correlation
No ratings yet
Hypothesis Testing Correlation
15 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
7 Measures of Association
No ratings yet
7 Measures of Association
37 pages
A Bayesian Framework For Tilt Perception and Confidence 2005-4359
No ratings yet
A Bayesian Framework For Tilt Perception and Confidence 2005-4359
8 pages
Chapter 15 ANCOVA For Dichotomous Dependent Variables
No ratings yet
Chapter 15 ANCOVA For Dichotomous Dependent Variables
12 pages
Current Ratio:: Current Assets Current Liabilities
No ratings yet
Current Ratio:: Current Assets Current Liabilities
5 pages
Week4 TU
No ratings yet
Week4 TU
27 pages
Econometrics-CH-4 (1)
No ratings yet
Econometrics-CH-4 (1)
14 pages
Topic V
No ratings yet
Topic V
30 pages
PSYC8010 Topic 9 Logistic Regression R
No ratings yet
PSYC8010 Topic 9 Logistic Regression R
47 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
The Coefficient of Determinaton Exposed - Hahn
No ratings yet
The Coefficient of Determinaton Exposed - Hahn
4 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
29 pages
Human Anatomy & Physiology I Lab 2 Graphing Styles & Interpreting Graphs
No ratings yet
Human Anatomy & Physiology I Lab 2 Graphing Styles & Interpreting Graphs
18 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Perraillon Marginal Effects Lecture Lisbon 0
No ratings yet
Perraillon Marginal Effects Lecture Lisbon 0
65 pages
Sidang Pleno Pemicu 6: A Piece of Scientific Article Oleh: Kelompok 2
No ratings yet
Sidang Pleno Pemicu 6: A Piece of Scientific Article Oleh: Kelompok 2
17 pages
Business Analytics and Operations Research
No ratings yet
Business Analytics and Operations Research
34 pages
04 Chap04 ClassificationMethods-LogisticRegression 2024
No ratings yet
04 Chap04 ClassificationMethods-LogisticRegression 2024
23 pages
11_eco_correlation_tp01
No ratings yet
11_eco_correlation_tp01
7 pages
Lesson 13 Logistic Regression
No ratings yet
Lesson 13 Logistic Regression
26 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
k2 Attachments CT Lecture 14. Multiple Linear Regression
No ratings yet
k2 Attachments CT Lecture 14. Multiple Linear Regression
37 pages
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
No ratings yet
Nonparametric Statistics and Model Selection: 5.1 Estimating Distributions and Distribution-Free Tests
10 pages
R07 Correlation and Regression IFT Notes
No ratings yet
R07 Correlation and Regression IFT Notes
27 pages
Exemplar Report
No ratings yet
Exemplar Report
6 pages
de_Silva_Tenreyro-2017-Population_Control_Policies_Fertility_Convergence (clean)
No ratings yet
de_Silva_Tenreyro-2017-Population_Control_Policies_Fertility_Convergence (clean)
26 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
1.0 Estimation of Uncertainty Hvsc/Qsp-007/F1
No ratings yet
1.0 Estimation of Uncertainty Hvsc/Qsp-007/F1
3 pages
Advanced Econometrics: Instructor: Kanika Mahajan
No ratings yet
Advanced Econometrics: Instructor: Kanika Mahajan
36 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
17 pages
Week 12 LPN Logit 0
No ratings yet
Week 12 LPN Logit 0
35 pages
Path Dependent Volatility: Paolo Foschi and Andrea Pascucci Dipartimento Di Matematica, Universit' A Di Bologna
No ratings yet
Path Dependent Volatility: Paolo Foschi and Andrea Pascucci Dipartimento Di Matematica, Universit' A Di Bologna
18 pages
Chapter 5 Stat
No ratings yet
Chapter 5 Stat
6 pages
Hsts423 Unit 4
No ratings yet
Hsts423 Unit 4
13 pages
Lecture Slides Part1 3
No ratings yet
Lecture Slides Part1 3
77 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
Logistic Regression
100% (2)
Logistic Regression
41 pages
Probability Distributions
No ratings yet
Probability Distributions
6 pages
CS-13410 Introduction To Machine Learning: Lecture # 17
No ratings yet
CS-13410 Introduction To Machine Learning: Lecture # 17
11 pages
STATISTICS Module LESSON 7
No ratings yet
STATISTICS Module LESSON 7
11 pages
The Statistical Evaluation of Collective Risk Models With Various
No ratings yet
The Statistical Evaluation of Collective Risk Models With Various
26 pages
Chapter 2. Simple Linear Regression Module May13
No ratings yet
Chapter 2. Simple Linear Regression Module May13
20 pages
A7 - One Way Anova
No ratings yet
A7 - One Way Anova
7 pages
Correlation and regression
No ratings yet
Correlation and regression
32 pages
Block-2
No ratings yet
Block-2
111 pages
Executive Summary: Business Statistics 061 - Fp-036 Final Report
No ratings yet
Executive Summary: Business Statistics 061 - Fp-036 Final Report
35 pages
Chapter 9_2024_ Lecture Slides (1)
No ratings yet
Chapter 9_2024_ Lecture Slides (1)
18 pages
Item Analysis Q1
No ratings yet
Item Analysis Q1
16 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
15 pages
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
37 pages
Lect9 Math231
No ratings yet
Lect9 Math231
42 pages
Lect8 Math231
No ratings yet
Lect8 Math231
30 pages
Inferential Statistics: Shaheena Bashir
No ratings yet
Inferential Statistics: Shaheena Bashir
18 pages
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
36 pages
Lect4 Math231
No ratings yet
Lect4 Math231
31 pages
Lect6 Math231
No ratings yet
Lect6 Math231
38 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Lect3 Math231
No ratings yet
Lect3 Math231
31 pages
Lect2 Math231
No ratings yet
Lect2 Math231
34 pages
Lect1 Math231
No ratings yet
Lect1 Math231
65 pages
Use of Computer in Data Analysis
No ratings yet
Use of Computer in Data Analysis
48 pages
Package RSM': R Topics Documented
No ratings yet
Package RSM': R Topics Documented
33 pages
1102-Article Text-2262-2-10-20220628
No ratings yet
1102-Article Text-2262-2-10-20220628
19 pages
Statistics For Support Slides
No ratings yet
Statistics For Support Slides
186 pages
PSY417 Week12
No ratings yet
PSY417 Week12
34 pages
Local Media7505544771645513216
No ratings yet
Local Media7505544771645513216
13 pages
More Than 2000 Solved MCQs On Research Methodology - 1
67% (3)
More Than 2000 Solved MCQs On Research Methodology - 1
14 pages
BY Muhammad Imad Khan
No ratings yet
BY Muhammad Imad Khan
46 pages
Business Research Methods
100% (1)
Business Research Methods
133 pages
BigML WhizzML Tutorials
No ratings yet
BigML WhizzML Tutorials
45 pages
STAT 200 Week 7 Homework Problems
No ratings yet
STAT 200 Week 7 Homework Problems
9 pages
Math in The Modern World
No ratings yet
Math in The Modern World
8 pages
Unit II. Methods and Techniques For Data Analytics
No ratings yet
Unit II. Methods and Techniques For Data Analytics
91 pages
Measuring The Impact of ISO 14001 Implementation
No ratings yet
Measuring The Impact of ISO 14001 Implementation
10 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Lecture 9: Variables: Muhammad Azeem Qureshi
No ratings yet
Lecture 9: Variables: Muhammad Azeem Qureshi
49 pages
The Effects of Food Marketing On Children's Preferences: Testing The Moderating Roles of Age and Gender
No ratings yet
The Effects of Food Marketing On Children's Preferences: Testing The Moderating Roles of Age and Gender
17 pages
Pendulumjournal
No ratings yet
Pendulumjournal
12 pages
SOCO's Impact On Individual Sales Performance: The Integration of Selling Skills As A Missing Link
No ratings yet
SOCO's Impact On Individual Sales Performance: The Integration of Selling Skills As A Missing Link
13 pages
Data Visualisation Using Python
100% (1)
Data Visualisation Using Python
77 pages
Patient Satisfaction and Associated Factors Among In-Patients in Primary Hospitals of North Shoa Zone, Amhara Regional State, Ethiopia
No ratings yet
Patient Satisfaction and Associated Factors Among In-Patients in Primary Hospitals of North Shoa Zone, Amhara Regional State, Ethiopia
6 pages
450 - 82235 - RESEARCH DESIGN Module 5
No ratings yet
450 - 82235 - RESEARCH DESIGN Module 5
44 pages
Report On Machine Learning-Jyoti Poddar-EC084
No ratings yet
Report On Machine Learning-Jyoti Poddar-EC084
70 pages
Discussion Question Week 3 - Candra Maharani Utami 2206040061
No ratings yet
Discussion Question Week 3 - Candra Maharani Utami 2206040061
2 pages
RSCH2122 1ST Quarter Exam 1
100% (1)
RSCH2122 1ST Quarter Exam 1
48 pages
04.Session Notes on Principal Component Regression(1)
No ratings yet
04.Session Notes on Principal Component Regression(1)
12 pages
Course Notes Linear Regression
No ratings yet
Course Notes Linear Regression
8 pages
CH 13 F - Hooman
No ratings yet
CH 13 F - Hooman
16 pages
1 s2.0 S2405844023025690 Main
No ratings yet
1 s2.0 S2405844023025690 Main
23 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages