0% found this document useful (0 votes)

18 views6 pages

GLM in R

The document discusses Generalized Linear Models (GLMs), focusing on logistic and Poisson regression, which are used for discrete response variables. It explains how to implement these models in R using the glm() function, along with examples demonstrating logistic regression for predicting car purchases and Poisson regression for analyzing elephant mating counts. The document also covers hypothesis testing, confidence intervals, and the interpretation of model outputs.

Uploaded by

RM Miau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

GLM in R

Uploaded by

RM Miau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Generalized Linear Models

We have previously worked with regression models where the response variable
is quantitative and normally distributed. Now we turn our attention to two types of
models where the response variable is discrete and the error terms do not follow
a normal distribution, namely logistic regression and Poisson regression. Both
belong to a family of regression models called generalized linear models.

Generalized linear models are extensions of traditional regression models that

allow the mean to depend on the explanatory variables through a link function,
and the response variable to be any member of a set of distributions called the
exponential family (e.g., Normal, Poisson, Binomial).

We can use the function glm() to work with generalized linear models in R. It’s
usage is similar to that of the function lm() which we previously used for multiple
linear regression. The main difference is that we need to include an additional
argument family to describe the error distribution and link function to be used in
the model. In this tutorial we show how glm() can be used to fit logistic
regression and Poisson regression models.

A. Logistic Regression

Logistic regression is appropriate when the response variable is categorical with

two possible outcomes (i.e., binary outcomes). Binary variables can be
represented using an indicator variable Yi, taking on values 0 or 1, and modeled
using a binomial distribution with probability P(Yi=1) = i. Logistic regression
models this probability as a function of one or more explanatory variables.

To perform logistic regression in R, use the command:

> glm(response ~ explanantory_variables, family=binomial)

Note that the option family is set to binomial, which tells R to perform logistic
regression.

Ex. A car manufacturer was interested in creating a model for determining the
probability that families will purchase a new car during the next year. A random
sample of 33 suburban families was selected. Data on annual income (in
thousands of dollars) and the current age of the oldest family car (in years) was
obtained. A follow-up interview was conducted a year later to determine whether
or not the family actually purchased a new car during the year (Y=1 if the family
purchased a car and 0 otherwise).

We are interested in determining the probability that a family purchases a new

car given their income and the age of their oldest car.
To read in the data set and fit a logistic regression model we type:

> dat = read.table("Purchase.txt",header=TRUE)

> results = glm(new ~ income + age, family=binomial)
> results

Call: glm(formula = new ~ income + age, family = binomial)

Coefficients:
(Intercept) income age
-4.73931 0.06773 0.59863

Degrees of Freedom: 32 Total (i.e. Null); 30 Residual

Null Deviance: 44.99
Residual Deviance: 36.69 AIC: 42.69

According to the output, the model is logit( i) = -4.74 + 0.068income + 0.60age.

After fitting the model, we can test the overall model fit and hypothesis regarding
a subset of regression parameters using a likelihood ratio test (LRT). Likelihood
ratio tests are similar to partial F-tests in the sense that they compare the full
model with a restricted model where the explanatory variables of interest are
omitted. The p-values of the tests are calculated using the 2 distribution.

To test the hypothesis H0: 1= 2=0 we can compare our model with a reduced
model that only contains an intercept term. A likelihood ratio test comparing the
full and reduced models can be performed using the anova() function with the
additional option test="Chisq".

> results.reduced =glm(new ~ 1, family=binomial)

> anova(results.reduced,results, test="Chisq")
Analysis of Deviance Table
Model 1: new ~ 1
Model 2: new ~ income + age
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 32 44.987
2 30 36.690 2 8.298 0.016

The likelihood ratio test statistic is 2=8.298 with a p-value=0.016. Hence, we

have relatively strong evidence in favor of rejecting H0.
As a next step, we perform tests on the individual regression parameters.

> summary(results)

Call:
glm(formula = new ~ income + age, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6189 -0.8949 -0.5880 0.9653 2.0846

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.73931 2.10195 -2.255 0.0242 *
income 0.06773 0.02806 2.414 0.0158 *
age 0.59863 0.39007 1.535 0.1249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Null deviance: 44.987 on 32 degrees of freedom

Residual deviance: 36.690 on 30 degrees of freedom
AIC: 42.69

Number of Fisher Scoring iterations: 4

To test H0: 1=0, we use z = 2.414 (p-value=0.0158). Hence, the family’s income
appears to have a significant impact on the probability of purchasing a new car,
while controlling for the age of the families oldest car.

To test H0: 2=0, we use z = 1.535 (p-value=0.1249). Hence, the age of a family’s
oldest car does not appear to have a significant impact on the probability of
purchasing a new car, once income is included in the model.

To compute how the odds of purchasing a car changes as a function of income

use the commands:

> exp(coef(results))
(Intercept) income age
0.008744682 1.070079093 1.819627221

To create a 95% confidence interval for the estimate, type:

> exp(confint.default(results))
2.5 % 97.5 %
(Intercept) 0.0001420897 0.5381773
income 1.0128238514 1.1305710
age 0.8471457285 3.9084695
We see that the odds ratio corresponding to income is 1.070 (95% CI: (1.013,
1.131)). This implies that if we fix the age of the oldest car, increasing family
income by one thousand dollars will increase the odds of purchasing a new car
by 0.07.

We are often interested in using the fitted logistic regression curve to estimate
probabilities and construct confidence intervals for these estimates. We can do
this using the function predict.glm. The usage is similar to that of the function
predict which we previously used when working on multiple linear regression
problems. The main difference is the option type, which tells R which type of
prediction is required. The default predictions are given on the logit scale (i.e.
predictions are made in terms of the log odds), while using type = "response"
gives the predicted probabilities.

To predict the probability that a family with an annual income of $53 thousand
and whose oldest car is 1 year old will purchase a new car in the next year, type:

> pi.hat = predict.glm(results, data.frame(income=53, age=1),

type="response", se.fit=TRUE)
> pi.hat$fit
[1] 0.3656668

This tells us that the predicted probability is 0.37. In order to obtain confidence
intervals we instead need to work on the logit scale and thereafter transform the
results into probabilities. To create a 95% confidence interval for the estimate,
type:

> l.hat = predict.glm(results, data.frame(income=53, age=1), se.fit=TRUE)

> ci = c(l.hat$fit - 1.96*l.hat$se.fit, l.hat$fit + 1.96*l.hat$se.fit)

To transform the results to probabilities type:

> exp(ci)/(1+exp(ci))
[1] 0.1145063 0.7198689

For a family with an annual income of $53 thousand and whose oldest car is 1
year old, the estimated probability of purchasing a new car is 0.366. A 95% CI is
given by (0.115, 0.720).
B. Poisson Regression

Data is often collected in counts (e.g. the number of heads in 12 flips of a coin or
the number of car thefts in a city during a year). Many discrete response
variables have counts as possible outcomes. Binomial counts are the number of
successes in a fixed number of trials, n. Poisson counts are the number of
occurrences of some event in a certain interval of time (or space). While Binomial
counts only take values between 0 and n, Poisson counts have no upper bound.

We now consider a nonlinear regression model where the response outcomes

are discrete counts that follow a Poisson distribution. Poisson regression
provides a model that describes how the mean response , changes as a
function of one or more explanatory variables. To perform logistic regression in
R, we use the command:

> glm(response ~ explanantory_variables, family=poisson)

Note that we specified the family to be poisson, which tells R to perform Poisson
regression.

Ex. Researchers studied 41 male African elephants over a period of 8 years.

The age of the elephant at the beginning of the study and the number of
successful matings during the 8 years were recorded. We assume the number of
matings follows a Poisson distribution, where the mean depends on the age of
the elephant in question.

We can fit a Poisson regression model using the following code:

> dat = read.table("elephants.txt", header=TRUE)

> attach(dat)
> results = glm(mating ~ age, family=poisson)
> summary(results)

Call:
glm(formula = mating ~ age, family = poisson)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.58201 0.54462 -2.905 0.00368 **
age 0.06869 0.01375 4.997 5.81e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 75.372 on 40 degrees of freedom

Residual deviance: 51.012 on 39 degrees of freedom
AIC: 156.46

Number of Fisher Scoring iterations: 5

To determine whether there is a significant relationship between the mean

number of matings and the age of the elephants we test H0: 1=0. The test
statistic is z=4.997 (p-value<0.0001). Hence, it appears that age does impact the
mean number of matings.

To see how the mean number of matings increases per year use the commands:

> beta =coef(results)

> beta
(Intercept) age
-1.58200796 0.06869281
> exp(beta[2])
age
1.071107

To create a 95% confidence interval for the estimate, type:

> exp(confint.default(results))
2.5 % 97.5 %
(Intercept) 0.07069036 0.5977577
age 1.04263544 1.1003563

Hence, each additional year is associated with a 7.1% increase in the mean
number of matings. A 95% confidence interval is given by (1.043, 1.100), which
represents a 4.3 - 10.0% increase.

James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
Generalized Linear Models For Insurance Data
100% (3)
Generalized Linear Models For Insurance Data
208 pages
07 GLM
No ratings yet
07 GLM
49 pages
04.laboratory Exercise IV
80% (5)
04.laboratory Exercise IV
5 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Fox and Weisberg Logistic Regression
100% (1)
Fox and Weisberg Logistic Regression
4 pages
CS1B April 2024
No ratings yet
CS1B April 2024
9 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Week6 2 GLM2
No ratings yet
Week6 2 GLM2
26 pages
HW5 JW
No ratings yet
HW5 JW
12 pages
16 GLM2
No ratings yet
16 GLM2
29 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
R Handbook - Regression For Count Data
No ratings yet
R Handbook - Regression For Count Data
13 pages
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
Sestrada Logistic Regression in R 02172023
No ratings yet
Sestrada Logistic Regression in R 02172023
25 pages
Acts 372 Unit 6
No ratings yet
Acts 372 Unit 6
40 pages
STAT511Q2Q4
No ratings yet
STAT511Q2Q4
11 pages
P NB ProbitE
No ratings yet
P NB ProbitE
21 pages
Logistic Regression in SPSS
No ratings yet
Logistic Regression in SPSS
4 pages
Logistic Regression - Exercises
No ratings yet
Logistic Regression - Exercises
8 pages
2101 F 12 Logistic Regression With R1
No ratings yet
2101 F 12 Logistic Regression With R1
10 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
5 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Problem-Set - 1 Practise Problems From Textbook
No ratings yet
Problem-Set - 1 Practise Problems From Textbook
2 pages
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
No ratings yet
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
10 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
A1
No ratings yet
A1
8 pages
HWK 5
No ratings yet
HWK 5
16 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
BES - R Lab 7
No ratings yet
BES - R Lab 7
5 pages
Week6 1 GLM
No ratings yet
Week6 1 GLM
28 pages
Greenwood Intermediate Statistics With R
No ratings yet
Greenwood Intermediate Statistics With R
429 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
17 pages
Assignment #2 - For Statistical Software
No ratings yet
Assignment #2 - For Statistical Software
4 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Lecture 4: Parameter Estimation and Diagnostics in Logistic Regression
No ratings yet
Lecture 4: Parameter Estimation and Diagnostics in Logistic Regression
40 pages
Chapter 6
No ratings yet
Chapter 6
24 pages
Lect 12
No ratings yet
Lect 12
36 pages
Lec-03 LogisticRegression
No ratings yet
Lec-03 LogisticRegression
32 pages
Solutions Week 10
No ratings yet
Solutions Week 10
7 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Note 4
No ratings yet
Note 4
18 pages
GLM Ohp
No ratings yet
GLM Ohp
6 pages
Regression models: ∼ N (µ, φ) µ Y ∼ P (µ, φ) g (µ) = Xβ
No ratings yet
Regression models: ∼ N (µ, φ) µ Y ∼ P (µ, φ) g (µ) = Xβ
6 pages
Count Data
No ratings yet
Count Data
5 pages
MKT3600 - L09 - Correlation and Regression
No ratings yet
MKT3600 - L09 - Correlation and Regression
51 pages
Generalized Linear Models-1
No ratings yet
Generalized Linear Models-1
29 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Stable Set Problem: Branch & Cut Algorithms: Encyclopedia of Optimization
No ratings yet
Stable Set Problem: Branch & Cut Algorithms: Encyclopedia of Optimization
20 pages
An Algorithm For Finding
No ratings yet
An Algorithm For Finding
26 pages
Connect Game Over Dyn Net W
No ratings yet
Connect Game Over Dyn Net W
15 pages
Random Point Patterns
No ratings yet
Random Point Patterns
13 pages
GLM & Logistic
No ratings yet
GLM & Logistic
26 pages
Guidelines Connect Conserv Plann EU
No ratings yet
Guidelines Connect Conserv Plann EU
149 pages
I.6 Statistical Models For GenotypeEnvironment
No ratings yet
I.6 Statistical Models For GenotypeEnvironment
32 pages
Paper Charlotte
No ratings yet
Paper Charlotte
26 pages
Prima al2024.ComprehFramewAssessMultiSpLandscapConnect
No ratings yet
Prima al2024.ComprehFramewAssessMultiSpLandscapConnect
15 pages
User Guide Applying Marxan With Zones
No ratings yet
User Guide Applying Marxan With Zones
40 pages
Schlaepfer&Lawler2023 ConservBiodClimChgShiftPrior
No ratings yet
Schlaepfer&Lawler2023 ConservBiodClimChgShiftPrior
14 pages
Biodiversity and ClimateChange
No ratings yet
Biodiversity and ClimateChange
164 pages
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
No ratings yet
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
13 pages
Sample Thesis Using Regression Analysis
100% (5)
Sample Thesis Using Regression Analysis
6 pages
Manova Result
No ratings yet
Manova Result
43 pages
Regression Adiba
No ratings yet
Regression Adiba
8 pages
Ch13slides Generalized Linear Models
No ratings yet
Ch13slides Generalized Linear Models
24 pages
PANDUAN BAGUS Always Control For Year Effects in Panel Regressions
No ratings yet
PANDUAN BAGUS Always Control For Year Effects in Panel Regressions
2 pages
EXERCISE 8 Spss
No ratings yet
EXERCISE 8 Spss
2 pages
Aiml Unit 3 1
No ratings yet
Aiml Unit 3 1
9 pages
Linear Regression Using Python
No ratings yet
Linear Regression Using Python
18 pages
Regression Analysis and Modelling - Amar Sahay
No ratings yet
Regression Analysis and Modelling - Amar Sahay
93 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Artificial Intelligence Lec 4
No ratings yet
Artificial Intelligence Lec 4
13 pages
BEE 3023 Assignment
No ratings yet
BEE 3023 Assignment
8 pages
Lease Squares Method
No ratings yet
Lease Squares Method
10 pages
Assignment Sheet (Questions) - RDD
No ratings yet
Assignment Sheet (Questions) - RDD
3 pages
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
No ratings yet
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
25 pages
PeakFit 4.12 PDF
No ratings yet
PeakFit 4.12 PDF
2 pages
Logistic Regression-Advanced Biostat PDF
No ratings yet
Logistic Regression-Advanced Biostat PDF
86 pages
WinTAX4 Datasheet
No ratings yet
WinTAX4 Datasheet
19 pages
M2 Ex 3.1 Chapter 3 Part 1
No ratings yet
M2 Ex 3.1 Chapter 3 Part 1
21 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
Chapter 17 - Cointegration and ECMs
No ratings yet
Chapter 17 - Cointegration and ECMs
17 pages
Handlg Data Ch3
No ratings yet
Handlg Data Ch3
0 pages
Da Unit III
0% (1)
Da Unit III
43 pages
Computer Programming and Application: 3 Interpolation and Curve Fitting
No ratings yet
Computer Programming and Application: 3 Interpolation and Curve Fitting
43 pages
Cars
No ratings yet
Cars
103 pages
Linear Regression Skills Quiz
No ratings yet
Linear Regression Skills Quiz
13 pages
Exam Econometrics Bachelor's 2018 Utrecht University
No ratings yet
Exam Econometrics Bachelor's 2018 Utrecht University
17 pages
10 3905jod 2018 26 2 019 PD PDF
No ratings yet
10 3905jod 2018 26 2 019 PD PDF
20 pages
Uts Ekonometrika
No ratings yet
Uts Ekonometrika
2 pages

GLM in R

Uploaded by

GLM in R

Uploaded by

Generalized Linear Models

Generalized linear models are extensions of traditional regression models that

Logistic regression is appropriate when the response variable is categorical with

To perform logistic regression in R, use the command:

> glm(response ~ explanantory_variables, family=binomial)

We are interested in determining the probability that a family purchases a new

> dat = read.table("Purchase.txt",header=TRUE)

Call: glm(formula = new ~ income + age, family = binomial)

Degrees of Freedom: 32 Total (i.e. Null); 30 Residual

According to the output, the model is logit( i) = -4.74 + 0.068*income + 0.60*age.

> results.reduced =glm(new ~ 1, family=binomial)

The likelihood ratio test statistic is 2=8.298 with a p-value=0.016. Hence, we

Null deviance: 44.987 on 32 degrees of freedom

Number of Fisher Scoring iterations: 4

To compute how the odds of purchasing a car changes as a function of income

To create a 95% confidence interval for the estimate, type:

> pi.hat = predict.glm(results, data.frame(income=53, age=1),

> l.hat = predict.glm(results, data.frame(income=53, age=1), se.fit=TRUE)

To transform the results to probabilities type:

We now consider a nonlinear regression model where the response outcomes

> glm(response ~ explanantory_variables, family=poisson)

Ex. Researchers studied 41 male African elephants over a period of 8 years.

We can fit a Poisson regression model using the following code:

> dat = read.table("elephants.txt", header=TRUE)

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 75.372 on 40 degrees of freedom

Number of Fisher Scoring iterations: 5

To determine whether there is a significant relationship between the mean

> beta =coef(results)

To create a 95% confidence interval for the estimate, type:

You might also like

According to the output, the model is logit( i) = -4.74 + 0.068income + 0.60age.