100% found this document useful (27 votes)

719 views23 pages

Modeling Count Data. ISBN 1107611253, 978-1107611252

ISBN-10: 1107611253. ISBN-13: 978-1107611252. Modeling Count Data Full PDF DOCX Download

Uploaded by

goldietovasen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (27 votes)

719 views23 pages

Modeling Count Data. ISBN 1107611253, 978-1107611252

ISBN-10: 1107611253. ISBN-13: 978-1107611252. Modeling Count Data Full PDF DOCX Download

Uploaded by

goldietovasen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Modeling Count Data

Visit the link below to download the full version of this book:
https://cheaptodownload.com/product/modeling-count-data-1st-edition-full-pdf-dow
nload/
Other Statistics Books by Joseph M. Hilbe

Generalized Linear Models and Extensions (2001, 2007, 2013 – with

J. Hardin)
Generalized Estimating Equations (2002, 2013 – with J. Hardin)
Negative Binomial Regression (2007, 2011)
Logistic Regression Models (2009)
Solutions Manual for Logistic Regression Models (2009)
R for Stata Users (2010 – with R. Muenchen)
Methods of Statistical Model Estimation (2013 – with A. Robinson)
A Beginner’s Guide to GLM and GLMM with R: A Frequentist and Bayesian
Perspective for Ecologists (2013 – with A. Zuur and E. Ieno)
Quasi–Least Squares Regression (2014 – with J. Shults)
Practical Predictive Analytics and Decisioning Systems for Medicine (2014 –
with L. Miner, P. Bolding, M. Goldstein, T. Hill, R. Nisbit, N. Walton, and
G. Miner)
MODELING COUNT DATA
JOSEPH M. HILBE
Arizona State University
and
Jet Propulsion Laboratory,
California Institute of Technology
32 Avenue of the Americas, New York, NY 10013-2473, USA

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781107611252

C Joseph M. Hilbe 2014

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2014

A catalog record for this publication is available from the British Library.
ISBN 978-1-107-02833-3 Hardback
ISBN 978-1-107-61125-2 Paperback
Additional resources for this publication at www.cambridge.org/9781107611252
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or
third-party Internet web sites referred to in this publication and does not guarantee that any content on
such web sites is, or will remain, accurate or appropriate.
Contents

Preface xi

Chapter 1
Varieties of Count Data 1

Some Points of Discussion 1

1.1 What Are Counts? 1
1.2 Understanding a Statistical Count Model 3
1.2.1 Basic Structure of a Linear Statistical Model 3
1.2.2 Models and Probability 7
1.2.3 Count Models 9
1.2.4 Structure of a Count Model 16
1.3 Varieties of Count Models 18
1.4 Estimation – the Modeling Process 22
1.4.1 Software for Modeling 22
1.4.2 Maximum Likelihood Estimation 23
1.4.3 Generalized Linear Models and IRLS Estimation 31
1.5 Summary 33

Chapter 2
Poisson Regression 35

Some Points of Discussion 35

2.1 Poisson Model Assumptions 36
2.2 Apparent Overdispersion 39

v
vi CONTENTS

2.3 Constructing a “True” Poisson Model 41

2.4 Poisson Regression: Modeling Real Data 48
2.5 Interpreting Coefficients and Rate Ratios 55
2.5.1 How to Interpret a Poisson Coefficient and
Associated Statistics 55
2.5.2 Rate Ratios and Probability 59
2.6 Exposure: Modeling over Time, Area, and Space 62
2.7 Prediction 66
2.8 Poisson Marginal Effects 68
2.8.1 Marginal Effect at the Mean 69
2.8.2 Average Marginal Effects 70
2.8.3 Discrete Change or Partial Effects 71
2.9 Summary 73

Chapter 3
Testing Overdispersion 74

Some Points of Discussion 74

3.1 Basics of Count Model Fit Statistics 74
3.2 Overdispersion: What, Why, and How 81
3.3 Testing Overdispersion 81
3.3.1 Score Test 84
3.3.2 Lagrange Multiplier Test 87
3.3.3 Chi2 Test: Predicted versus Observed Counts 88
3.4 Methods of Handling Overdispersion 92
3.4.1 Scaling Standard Errors: Quasi-count Models 92
3.4.2 Quasi-likelihood Models 96
3.4.3 Sandwich or Robust Variance Estimators 99
3.4.4 Bootstrapped Standard Errors 105
3.5 Summary 106

Chapter 4
Assessment of Fit 108

Some Points of Discussion 108

4.1 Analysis of Residual Statistics 108
4.2 Likelihood Ratio Test 112
Contents vii

4.2.1 Standard Likelihood Ratio Test 112

4.2.2 Boundary Likelihood Ratio Test 114
4.3 Model Selection Criteria 116
4.3.1 Akaike Information Criterion 116
4.3.2 Bayesian Information Criterion 119
4.4 Setting up and Using a Validation Sample 122
4.5 Summary and an Overview of the Modeling Process 123
4.5.1 Summary of What We Have Thus Far Discussed 124

Chapter 5
Negative Binomial Regression 126

Some Points of Discussion 126

5.1 Varieties of Negative Binomial Models 126
5.2 Negative Binomial Model Assumptions 128
5.2.1 A Word Regarding Parameterization of
the Negative Binomial 133
5.3 Two Modeling Examples 136
5.3.1 Example: rwm1984 136
5.3.2 Example: medpar 148
5.4 Additional Tests 152
5.4.1 General Negative Binomial Fit Tests 152
5.4.2 Adding a Parameter – NB-P Negative Binomial 153
5.4.3 Modeling the Dispersion – Heterogeneous
Negative Binomial 156
5.5 Summary 160

Chapter 6
Poisson Inverse Gaussian Regression 162

Some Points of Discussion 162

6.1 Poisson Inverse Gaussian Model Assumptions 162
6.2 Constructing and Interpreting the PIG Model 165
6.2.1 Software Considerations 165
6.2.2 Examples 165
6.3 Summary – Comparing Poisson, NB, and PIG Models 170
viii CONTENTS

Chapter 7
Problems with Zeros 172

Some Points of Discussion 172

7.1 Counts without Zeros – Zero-Truncated Models 173
7.1.1 Zero-Truncated Poisson (ZTP) 174
7.1.2 Zero-Truncated Negative Binomial (ZTNB) 177
7.1.3 Zero-Truncated Poisson Inverse Gaussian (ZTPIG) 180
7.1.4 Zero-Truncated NB-P (ZTNBP) 182
7.1.5 Zero-Truncated Poisson Log-Normal (ZTPLN) 183
7.1.6 Zero-Truncated Model Summary 184
7.2 Two-Part Hurdle Models 184
7.2.1 Poisson and Negative Binomial Logit Hurdle Models 185
7.2.2 PIG-Logit and Poisson Log-Normal Hurdle Models 192
7.2.3 PIG-Poisson Hurdle Model 194
7.3 Zero-Inflated Mixture Models 196
7.3.1 Overview and Guidelines 196
7.3.2 Fit Tests for Zero-Inflated Models 197
7.3.3 Fitting Zero-Inflated Models 197
7.3.4 Good and Bad Zeros 198
7.3.5 Zero-Inflated Poisson (ZIP) 199
7.3.6 Zero-Inflated Negative Binomial (ZINB) 202
7.3.7 Zero-Inflated Poisson Inverse Gaussian (ZIPIG) 206
7.4 Summary – Finding the Optimal Model 207

Chapter 8
Modeling Underdispersed Count Data – Generalized Poisson 210

Some Points of Discussion 210

Chapter 9
Complex Data: More Advanced Models 217

Types of Data and Problems Dealt with in This Chapter 217

9.1 Small and Unbalanced Data – Exact Poisson Regression 218
9.2 Modeling Truncated and Censored Counts 224
9.2.1 Truncated Count Models 225
9.2.2 Censored Count Models 229
9.2.3 Poisson-Logit Hurdle at 3 Model 231
Contents ix

9.3 Counts with Multiple Components – Finite Mixture Models 232

9.4 Adding Smoothing Terms to a Model – GAM 235
9.5 When All Else Fails: Quantile Count Models 238
9.6 A Word about Longitudinal and Clustered Count Models 239
9.6.1 Generalized Estimating Equations (GEEs) 239
9.6.2 Mixed-Effects and Multilevel Models 241
9.7 Three-Parameter Count Models 245
9.8 Bayesian Count Models – Future Directions of Modeling? 248
9.9 Summary 252

Appendix: SAS Code 255

Bibliography 269
Index 277
Preface

Modeling Count Data is written for the practicing researcher who has a
reason to analyze and draw sound conclusions from modeling count data.
More specifically, it is written for an analyst who needs to construct a count
response model but is not sure how to proceed.
A count response model is a statistical model for which the dependent, or
response, variable is a count. A count is understood as a nonnegative discrete
integer ranging from zero to some specified greater number. This book aims
to be a clear and understandable guide to the following points:

r How to recognize the characteristics of count data

r Understanding the assumptions on which a count model is based
r Determining whether data violate these assumptions (e.g., overdispersion),
why this is so, and what can be done about it
r Selecting the most appropriate model for the data to be analyzed
r Constructing a well-fitted model
r Interpreting model parameters and associated statistics
r Predicting counts, rate ratios, and probabilities based on a model
r Evaluating the goodness-of-fit for each model discussed

There is indeed a lot to consider when selecting the best-fitted model for
your data. I will do my best in these pages to clarify the foremost concepts
and problems unique to modeling counts. If you follow along carefully, you
should have a good overview of the subject and a basic working knowledge
needed for constructing an appropriate model for your study data. I focus
on understanding the nature of the most commonly used count models and

xi
xii PREFACE

on the problem of dealing with both over- and underdispersion, as well

as on Poisson and negative binomial regression and their many variations.
However, I also introduce several other count models that have not had
much use in research because of the unavailability of commercial software
for their estimation. In particular, I also discuss models such as the Poisson
inverse Gaussian, generalized Poisson, varieties of three-parameter negative
binomial, exact Poisson, and several other count models that will provide
analysts with an expanded ability to better model the data at hand. Stata
and/or R software and guidelines are provided for all of the models discussed
in the text.
I am supposing that most people who will use this book start with little to no
background in modeling count response data, although readers are expected
to have a working knowledge of a major statistical software package, as well
as a basic understanding of statistical regression. I provide an overview of
maximum likelihood and iterative reweighted least squares (IRLS) regression
in Sections 1.4.2 and 1.4.3, which assume an elementary understanding of
calculus, but I consider these two sections as optional to our discussion. They
are provided for those who are interested in how the majority of models we
discuss are estimated. I recommend that you read these sections, even if
you do not have the requisite mathematical background. I have attempted
to present the material so that it will still be understood. Various terms are
explained in these sections that will be used throughout the text.
Seasoned statisticians can also learn new material from the text, but I have
specifically written it for researchers or analysts, as well as students at the
upper-division to graduate levels, who want an entry-level book that focuses
on the practical aspects of count modeling. The book is also addressed to
statistical and predictive analytics consultants who find themselves faced
with a project involving the modeling of count data, as well as to anyone with
an interest in this class of statistical models. It is written in guidebook form,
with lots of bullet points, tables, and complete statistical programming code
for all examples discussed in the book.
Many readers of this book may be acquainted with my text Negative Bino-
mial Regression (Cambridge University Press), which was first published in
2007. A substantially enhanced second edition was published in 2011. That
text addresses nearly every count model for which there existed major statis-
tical software support at the time of the book’s publication. Negative Binomial
Regression was primarily written for those who wish to understand the math-
ematics behind the models as well as the specifics and applications of each
Preface xiii

model. I recommend it for those who wish to go beyond the discussions

found in Modeling Count Data.
I primarily use two statistical software packages to demonstrate examples of
the count models discussed in the book. First, the Stata 13 statistical package
(http://www.stata.com) is used throughout the text to display example model
output. I show both Stata code and output for most of the modeling examples.
I also provide R code (www.r-project.org) in the text that replicates, as far
as possible, the Stata output. R output is also given when helpful. There are
also times when no current Stata code exists for the modeling of a particular
procedure. In such cases, R is used. SAS code for a number of the models
discussed in the book is provided in the Appendix. SAS does not support
many of the statistical functions and tests discussed later in the book, but its
count-modeling capability is growing each year. I will advise readers on the
book’s web site as software for count models is developed for these packages.
I should mention that I have used Stat/Transfer 12 (2013, Circle Systems)
when converting data between statistical software packages. The user is able
to convert between 37 different file formats, including those used in this book.
It is a very helpful tool for those who must use more than one statistical or
spreadsheet file.
Many of the Stata statistical models discussed in the text are offered as a
standard part of the commercial package. Users have also contributed count
model “commands” for the use of the greater Stata community. Developers
of the user-authored commands used in the book are acknowledged at the
first use of the software. James Hardin and I have both authored and coau-
thored a number of the more advanced count models found in the book.
Many derive from our 2012 text Generalized Linear Models and Extensions,
3rd edition (Stata Press; Chapman & Hall/CRC). Several others in the book are
based on commands we developed in 2013 for journal article publications.
I should also mention that we also coauthored the current version of Stata’s
glm command (2001), although Stata has subsequently enhanced various
options over the past 12 years as new versions of Stata were released. Several
of the R functions and scripts used in the book were coauthored by Andrew
Robinson and me for use in our book (Hilbe and Robinson 2013). Data sets
and functions for this book, as well as for Hilbe (2011), are available in the
COUNT package, which may be downloaded from any CRAN mirror site. I also
recommend installing msme (Hilbe and Robinson), also available on CRAN.
I have also posted all of my user-authored Stata commands and functions, as
well as all data sets used in the book, on the book’s web site at the following
xiv PREFACE

address: http://works.bepress.com/joseph hilbe/. The book’s page with Cam-

bridge University Press is at www.cambridge.org/9781107611252.
The data files used for examples in the book are real data. The rwm1984and
medpar data sets are used extensively throughout the book. Other data
sets used include titanic, heart, azcabgptca, smoking, fishing,
fasttrakg, rwm5yr, nuts, and azprocedure. The data are defined
where first used. The medpar, rwm5yr, and titanic data are used more
than other data in the book. The medpar data are from the 1991 Arizona
Medicare files for two diagnostic groups related to cardiovascular procedures.
I prepared medpar in 1993 for use in workshops I gave at the time. The
rwm5yr data consist of 19,609 observations from the German Health Reform
data covering the five-year period of 1984–1988. Not all patients were in the
study for all five years. The count response is the number of visits made by
a patient to the doctor during that calendar year. The rwm1984 data were
created from rwm5yr, with only data from 1984 included – one patient, one
observation. The well-known titanic data set is from the 1912 Titanic ship
disaster survival data. It is in grouped format with survived as the response.
The predictors are age (adult vs. child), gender (male vs. female), and class
(1st-, 2nd-, and 3rd-class passengers). Crew members have been excluded.
I advise the reader that there are parts of Chapter 3 that use or adapt
text from the first edition of Negative Binomial Regression (Hilbe 2007a),
which is now out of print, as it was superseded by Hilbe (2011). Chapter 2
incorporates two tables that were also used in the first edition. I received very
good feedback regarding these sections and found no reason to change them
for this book. Now that the original book is out of print, these sections would
be otherwise lost.
I wish to acknowledge five eminent colleagues and friends in the truest
sense who in various ways have substantially contributed to this book, either
indirectly while working together on other projects or directly: James Hardin,
director of the Biostatistics Collaborative Unit and professor, Department of
Statistics and Epidemiology, University of South Carolina School of Medicine;
Andrew Robinson, director, Australian Centre of Excellence for Risk Analy-
sis (ACERA), Department of Mathematics and Statistics, University of Mel-
bourne, Australia; Alain Zuur, senior statistician and director of Highland
Statistics Ltd., UK; Peter Bruce, CEO, Institute for Statistics Education (Statis-
tics.com); and John Nelder, late Emeritus Professor of Statistics, Imperial
College, UK. John passed away in 2010, just shy of his eighty-sixth birthday;
our many discussions over a 20-year period are sorely missed. He definitely
Preface xv

spurred my interest in the negative binomial model. I am fortunate to have

known and to have worked with these fine statisticians. Each has enriched
my life in different ways.
Others who have contributed to this book’s creation include Valerie
Troiano and Kuber Dekar of the Institute for Statistics Education; Professor
William H. Greene, Department of Economics, New York University, and
author of the Limdep econometrics software; Dr. Gordon Johnston, Senior
Statistician, SAS Institute, author of the SAS Genmod Procedure; Professor
Milan Hejtmanek, Seoul National University, and Dr. Digant Gupta, M.D.,
director, Outcomes Research, Cancer Treatment Centers of America, both of
whom provided long hours reviewing early drafts of the book manuscript.
Helen Wheeler, production editor for Cambridge University Press, is also
gratefully acknowledged. A special acknowledgment goes to Patricia Branton
of Stata Corp., who has provided me with statistical support and friendship
for almost a quarter of a century. She has been a part of nearly every text I
have written on statistical modeling, including this book.
There have been many others who have contributed to this book as well,
but space limits their express acknowledgment. I intend to list all contributors
on the book’s web site. I invite readers to contact me regarding comments
or suggestions about the book. You may email me at [email protected] or at the
address on my BePress web site listed earlier.
Finally, I must also acknowledge Diana Gillooly, senior editor for mathe-
matical sciences with Cambridge University Press, who first encouraged me
to write this monograph. She has provided me with excellent feedback in my
attempt to develop a thoroughly applied book on count models. Her help
with this book has been invaluable and goes far beyond standard editorial
obligations. I also wish to thank my family for yet again supporting my writ-
ing of another book. My appreciation goes to my wife, Cheryl L. Hilbe, my
children and grandchildren, and our white Maltese dog, Sirr, who sits close
by my side for hours while I am typing. I dedicate this book to Cheryl for her
support and feedback during the time of this book’s preparation.

Joseph M. Hilbe
Florence, Arizona
August 12, 2013
CHAPTER 1

Varieties of Count Data

SOME POINTS OF DISCUSSION

r What are counts? What are count data?
r What is a linear statistical model?
r What is the relationship between a probability distribution function (PDF)
and a statistical model?
r What are the parameters of a statistical model? Where do they come from,
and can we ever truly know them?
r How does a count model differ from other regression models?
r What are the basic count models, and how do they relate with one another?
r What is overdispersion, and why is it considered to be the fundamental
problem when modeling count data?

1.1 WHAT ARE COUNTS?

When discussing the modeling of count data, it’s important to clarify exactly
what is meant by a count, as well as “count data” and “count variable.” The
word “count” is typically used as a verb meaning to enumerate units, items,
or events. We might count the number of road kills observed on a stretch of
highway, how many patients died at a particular hospital within 48 hours of
having a myocardial infarction, or how many separate sunspots were observed
in March 2013. “Count data,” on the other hand, is a plural noun referring
1
2 VARIETIES OF COUNT DATA

to observations made about events or items that are enumerated. In statistics,

count data refer to observations that have only nonnegative integer values
ranging from zero to some greater undetermined value. Theoretically, counts
can range from zero to infinity, but they are always limited to some lesser
distinct value – generally the maximum value of the count data being modeled.
When the data being modeled consist of a large number of distinct values,
even if they are positive integers, many statisticians prefer to model the counts
as if they were continuous data. We address this issue later in the book.
A “count variable” is a specific list or array of count data. Again, such
observations can only take on nonnegative integer values. However, in a
statistical model, a response variable is understood as being a random variable,
meaning that the particular set of enumerated values or counts could be other
than they are at any given time. Moreover, the values are assumed to be
independent of one another (i.e., they show no clear evidence of correlation).
This is an important criterion for count model data, and it stems from the fact
that the observations of a probability distribution are independent. On the
other hand, predictor values are fixed; that is, they are given as facts, which
are used to better understand the response.
We will be primarily concerned with four types of count variables in this
book. They are:

1. A count or enumeration of events

2. A count of items or events occurring within a period of time or over a
number of periods
3. A count of items or events occurring in a given geographical or spatial area
or over various defined areas
4. A count of the number of people having a particular disease, adjusted by
the size of the population at risk of contracting the disease

Understanding how count data are modeled, and what modeling entails, is
discussed in the following section. For readers with little background in linear
models, I strongly suggest that you read through Chapter 1 even though var-
ious points may not be fully understood. Then re-read the chapter carefully.
The essential concepts and relationships involved in modeling should then
be clear. In Chapter 1, I have presented the fundamentals of modeling, focus-
ing on normal and count model estimation from several viewpoints, which
should at the end provide the reader with a sense of how the modeling process
is to be understood when applied to count models. If certain points are still
1.2 Understanding a Statistical Count Model 3

unclear, I am confident that any problem areas regarding the assessment of

fit will be clear by the time you read through Chapter 4, on assessing model
fit. Those who have taken a statistics course in which linear regression is
examined should have no problem following the presentation.

1.2 UNDERSTANDING A STATISTICAL COUNT MODEL

1.2.1 Basic Structure of a Linear Statistical Model

Statistics may be generically understood as the science of collecting and ana-

lyzing data for the purpose of classification, prediction, and of attempting to
quantify and understand the uncertainty inherent in phenomena underlying
data.
A statistical model describes the relationship between one or more variables
on the basis of another variable or variables. For the purpose of the models we
discuss in this book, a statistical model can be understood as the mathematical
explanation of a count variable on the basis of one or more explanatory
variables.1 Such statistical models are stochastic, meaning that they are based
on probability functions. The traditional linear regression model is based on
the normal or Gaussian probability distribution and can be formalized in the
most simple case as
Y = ␤0 + ␤X + ε (1.1)

where Y is called the response, outcome, dependent, or sometimes just the y

variable. We use the term “response” or y when referring to the variable being
modeled. X is the explanatory or predictor variable that is used to explain
the occurrence of y. ␤ is the coefficient for X. It is a slope describing the rate
of change in the response based on a one-unit change in X, holding other
predictor values constant (usually at their mean values). ␤0 is the intercept,
which provides a value to fitted y, or ŷ, when, or if, X has the value of 0. ε
(eta) is the error term, which reflects the fact that the relationship between
X and Y is not exact, or deterministic. For the normal or linear regression
model, the errors are Gaussian or normally distributed, which is the most

1
A model may consist of only the response variable, unadjusted by explanatory
variables. Such a model is estimated by modeling the response on the intercept.
For example, using R: lm(y 1); using Stata: reg y.
4 VARIETIES OF COUNT DATA

well-used and basic probability distribution in statistics. ε is also referred to

as the residual term.
When a linear regression has more than one predictor, it may be schema-
tized by giving a separate beta and X value for each predictor, as

Y = ␤0 + ␤1 X 1 + ␤2 X 2 + · · · + ␤n X n + ε (1.2)

Statisticians usually convert equation (1.2) to one that has the left-hand side
being the predicted or expected mean value of the response, based on the sum
of the predictors and coefficients. Each associated coefficient and predictor is
called a regression term:

ŷ = ␤0 + ␤1 X 1 + ␤2 X 2 + · · · + ␤n X n (1.3)

␮ˆ = ␤0 + ␤1 X 1 + ␤2 X 2 + · · · + ␤n X n (1.4)

Notice that the error became part of the expected or predicted mean response.
“”, or hat over y and ␮ (mu), indicates that this is an estimated value. From
this point on, I use the symbol ␮ to refer to the predicted value, without a hat.
Understand, though, that when we are estimating a parameter or a statistic,
a hat should go over it. The true unknown parameter, on the other hand, has
no hat. You will also at times see the term E(y) used to mean “estimated y.” I
will not use it here.
In matrix form, where the individual terms of the regression are expressed
in a single term, we have

␮ = ␤X (1.5)

with ␤X being understood as the summation of the various terms, including

the intercept. As you may recall, the intercept is defined as ␤0 (1), or simply
␤0 . It is therefore a term that can be placed within the single matrix term
␤X. When models become complicated, viewing them in matrix form is
the only feasible way to see the various relationships involved. I should
mention that sometimes you see the term ␤X expressed as x␤. I reserve this
symbol for another part of the model, which we discuss a bit later in this
section.
1.2 Understanding a Statistical Count Model 5

Let’s look at example data (smoking). Suppose that we have a six-

observation model consisting of the following variables:
sbp: systolic blood pressure of subject
male: 1 = male; 0 = female
smoker: 1 = history of smoking; 0 = no history of smoking
age: age of subject
Using Stata statistical software, we display a linear regression of sbp on male,
smoker, and age, producing the following (nohead suppresses the display of
header statistics).

STATA CODE
. regress sbp male smoker age, nohead
------------------------------------------------------------------------
sbp | Coef. Std. Err. t P⬎|t| [95% Conf. Interval]
-------+----------------------------------------------------------------
male | 4.048601 .2507664 16.14 0.004 2.96964 5.127562
smoker | 6.927835 .1946711 35.59 0.001 6.090233 7.765437
age | .4698085 .02886 16.28 0.004 .3456341 .593983
̲ cons | 104.0059 .7751557 134.17 0.000 100.6707 107.3411
------------------------------------------------------------------------

Continuing with Stata, we may obtain the predicted value, ␮, which is the
estimated mean systolic blood pressure, and display the predictor values
together with ␮ (mu) as

. predict mu
. l // ’l’ is an abbreviation for list
+------------------------------------+
| sbp male smoker sge mu |
|------------------------------------|
1. | 131 1 1 34 130.9558 |
2. | 132 1 1 36 131.8954 |
3. | 122 1 0 30 122.1488 |
4. | 119 0 0 32 119.0398 |
5. | 123 0 1 26 123.1488 |
6. | 115 0 0 23 114.8115 |
+------------------------------------+

To see exactly what this means, we sum the terms of the regression. The
intercept term is also summed, but its values are set at 1. The _b[] term
6 VARIETIES OF COUNT DATA

captures the coefficient from the results saved by the software. For the inter-
cept, _b[_cons] adds the intercept term, slope[1], to the other values. The
term xb is also commonly referred to as the linear predictor.
. gen xb = _b[male]*male + _b[smoker]*smoker + _b[age]*age + _b[_cons]
. l
+-----------------------------------------------+
| sbp male smoker age mu xb |
|-----------------------------------------------|
1. | 131 1 1 34 130.9558 130.9558 |
2. | 132 1 1 36 131.8954 131.8954 |
3. | 122 1 0 30 122.1488 122.1488 |
4. | 119 0 0 32 119.0398 119.0398 |
5. | 123 0 1 26 123.1488 123.1488 |
6. | 115 0 0 23 114.8115 114.8115 |
+-----------------------------------------------+

The intercept is defined correctly; check by displaying it. The value is

indeed 1,
. di _cons
1

whereas _b[_cons] is the constant slope of the intercept as given in the

preceding regression output:
. di _b[_cons] /* intercept slope */
104.00589

Using R, we may obtain the same results with the following code:

R CODE
⬎ sbp ⬍- c(131,132,122,119,123,115)
⬎ male ⬍- c(1,1,1,0,0,0)
⬎ smoker ⬍- c(1,1,0,0,1,0)
⬎ age ⬍- c(34,36,30,32,26,23)
⬎ summary(reg1 ⬍- lm(sbp~ male+smoker+age))
⬍results not displayed⬎

Predicted values may be obtained by

⬎ mu ⬍- predict(reg1)
⬎ mu
1 2 3 4 5 6
130.9558 131.8954 122.1487 119.0398 123.1487 114.8115
1.2 Understanding a Statistical Count Model 7

As was done with the Stata code, we may calculate the linear predictor, which
is the same as ␮, by first abstracting the coefficient

⬎ cof ⬍- reg1$coef
⬎ cof
(Intercept) male smoker age
104.0058910 4.0486009 6.9278351 0.4698085

and then the linear predictor, xb. Each coefficient can be identified with [ ].
The values are identical to mu.

⬎ xb ⬍- cof[1] + cof[2]male + cof[3]smoker + cof[4]*age

⬎ xb
[1] 130.9558 131.8954 122.1487 119.0398 123.1487 114.8115

Notice the closeness of the observed response and predicted values. The
differences are

⬎ diff ⬍- sbp - mu
⬎ diff
1 2 3 4 5 6
0.04418262 0.10456554 -0.14874816 -0.03976436 -0.14874816 0.18851252

When the values of the linear predictor are close to the predicted or expected
values, we call the model well fitted.

1.2.2 Models and Probability

One of the points about statistical modeling rarely discussed is the relation-
ship of the data to a probability distribution. All parametric statistical models
are based on an underlying probability distribution. I mentioned before that
the normal or linear regression model is based on the Gaussian, or nor-
mal, probability distribution (see example in Figure 1.1). It is what defines
the error terms. When we are attempting to estimate a least squares regres-
sion or more sophisticated maximum likelihood model, we are estimating
the parameters of the underlying probability distribution that characterize
the data. These two foremost methods of estimation are described in the next
section of this opening chapter. The important point here is always to remem-
ber that when modeling count data we are really estimating the parameters
of a probability distribution that we believe best represents the data we are
modeling. We are never able to knowingly determine the true parameters

Human Biology. Human Biology 15th Edition ISBN 1259689794, 978-1259689796
100% (34)
Human Biology. Human Biology 15th Edition ISBN 1259689794, 978-1259689796
23 pages
Product Data Sheet: APC Smart-UPS RT 5000VA, 208V, Rackmount, 3U, 2x NEMA L6-20R & 2x NEMA L6-30R Outlets
No ratings yet
Product Data Sheet: APC Smart-UPS RT 5000VA, 208V, Rackmount, 3U, 2x NEMA L6-20R & 2x NEMA L6-30R Outlets
4 pages
Intensive Care Nursing: A Framework For Practice., 978-0815385936
100% (15)
Intensive Care Nursing: A Framework For Practice., 978-0815385936
23 pages
Practical Research 2 - First Quarter Exam
91% (34)
Practical Research 2 - First Quarter Exam
4 pages
Human Resources in Healthcare: Managing For Success, Fourth Edition. ISBN 156793708X, 978-1567937084
100% (26)
Human Resources in Healthcare: Managing For Success, Fourth Edition. ISBN 156793708X, 978-1567937084
23 pages
Artificial Intelligence Augmentation For Channel State Information in 5G and 6G
No ratings yet
Artificial Intelligence Augmentation For Channel State Information in 5G and 6G
7 pages
150 Years of ObamaCare. ISBN 9781421419633, 978-1421419633
100% (21)
150 Years of ObamaCare. ISBN 9781421419633, 978-1421419633
23 pages
Presentation For Industrial
No ratings yet
Presentation For Industrial
22 pages
Genetics and Genomics in Medicine. ISBN 0815344805, 978-0815344803
100% (20)
Genetics and Genomics in Medicine. ISBN 0815344805, 978-0815344803
23 pages
Sonotube Footing
No ratings yet
Sonotube Footing
1 page
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
100% (26)
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
23 pages
Mitsubishi Catalogue VRF-81
No ratings yet
Mitsubishi Catalogue VRF-81
1 page
SR Star Co-Super
No ratings yet
SR Star Co-Super
3 pages
Security Processor Architecture 1
No ratings yet
Security Processor Architecture 1
29 pages
Understanding Statistics and Experimental Design: How To Not Lie With Statistics (Learning Materials in Biosciences) - , 978-3030034986
100% (31)
Understanding Statistics and Experimental Design: How To Not Lie With Statistics (Learning Materials in Biosciences) - , 978-3030034986
23 pages
Research Paper - Attendease
No ratings yet
Research Paper - Attendease
13 pages
Human Genetics. ISBN 0073525367, 978-0073525365
100% (27)
Human Genetics. ISBN 0073525367, 978-0073525365
23 pages
PDF Real Numbers
No ratings yet
PDF Real Numbers
9 pages
Physical Sciences NSC P2 Memo Sept 2018 Eng Mpumalanga
No ratings yet
Physical Sciences NSC P2 Memo Sept 2018 Eng Mpumalanga
10 pages
Thermal Degradation of Mixed Plastic Wasteto Aromatics and Gas
No ratings yet
Thermal Degradation of Mixed Plastic Wasteto Aromatics and Gas
9 pages
Degradation of Silicon Two-Barrier Thin
No ratings yet
Degradation of Silicon Two-Barrier Thin
9 pages
Level 2 FAD1
No ratings yet
Level 2 FAD1
74 pages
In Search of Gentle Death: The Fight For Your Right To Die With Dignity.
100% (34)
In Search of Gentle Death: The Fight For Your Right To Die With Dignity.
23 pages
Complete Mechanics o Level Notes 2023-24 Syllabus
No ratings yet
Complete Mechanics o Level Notes 2023-24 Syllabus
19 pages
The New Public Health Law: A Transdisciplinary Approach To Practice and Advocacy. ISBN 0190681055, 978-0190681050
100% (30)
The New Public Health Law: A Transdisciplinary Approach To Practice and Advocacy. ISBN 0190681055, 978-0190681050
23 pages
Public Health Leadership: Strategies For Innovation in Population Health and Social Determinants., 978-1032476988
100% (31)
Public Health Leadership: Strategies For Innovation in Population Health and Social Determinants., 978-1032476988
23 pages
Anthrax: The Investigation of A Deadly Outbreak., 978-0520222045
100% (29)
Anthrax: The Investigation of A Deadly Outbreak., 978-0520222045
23 pages
Responding To Emergencies: Comprehensive First Aid/CPR/AED. ISBN 1584805544, 978-1584805540
100% (20)
Responding To Emergencies: Comprehensive First Aid/CPR/AED. ISBN 1584805544, 978-1584805540
23 pages
Climate Change and Public Health. ISBN 0190202459, 978-0190202453
100% (23)
Climate Change and Public Health. ISBN 0190202459, 978-0190202453
23 pages
Introduction To Risk Calculation in Genetic Counseling. ISBN 0195305272, 978-0195305272
100% (26)
Introduction To Risk Calculation in Genetic Counseling. ISBN 0195305272, 978-0195305272
23 pages
Health and Wealth: Studies in History and Policy (Rochester Studies in Medical History, Vol. 6) (Volume 6) - ISBN 1580461980, 978-1580461986
100% (29)
Health and Wealth: Studies in History and Policy (Rochester Studies in Medical History, Vol. 6) (Volume 6) - ISBN 1580461980, 978-1580461986
23 pages
An Introduction To Resting State fMRI (Oxford Neuroimaging Primers) - 1st Edition. ISBN 0198808224, 978-0198808220
95% (20)
An Introduction To Resting State fMRI (Oxford Neuroimaging Primers) - 1st Edition. ISBN 0198808224, 978-0198808220
23 pages
Toxic Nursing: Managing Bullying, Bad Attitudes, and Total Turmoil. ISBN 1937554422, 978-1937554422
100% (32)
Toxic Nursing: Managing Bullying, Bad Attitudes, and Total Turmoil. ISBN 1937554422, 978-1937554422
23 pages
Biopolitics: An Advanced Introduction (Biopolitics, 5) - ISBN 081475242X, 978-0814752425
100% (30)
Biopolitics: An Advanced Introduction (Biopolitics, 5) - ISBN 081475242X, 978-0814752425
23 pages
Leveraging Lean in The Emergency Department: Creating A Cost Effective, Standardized, High Quality, Patient-Focused Operation., 978-1482237313
100% (31)
Leveraging Lean in The Emergency Department: Creating A Cost Effective, Standardized, High Quality, Patient-Focused Operation., 978-1482237313
23 pages
Modeling A New Computer Framework For Managing Healthcare Organizations. ISBN 0367460602, 978-0367460600
100% (26)
Modeling A New Computer Framework For Managing Healthcare Organizations. ISBN 0367460602, 978-0367460600
23 pages
An EOQ Example: Slide-79
No ratings yet
An EOQ Example: Slide-79
24 pages
ThermostatCatalog 570-280
0% (1)
ThermostatCatalog 570-280
12 pages
Trust Matters in Health Care (State of Health) - ISBN 0335222838, 978-0335222834
100% (25)
Trust Matters in Health Care (State of Health) - ISBN 0335222838, 978-0335222834
23 pages
Seeds of Destruction: The Hidden Agenda of Genetic Manipulation. ISBN 0973714727, 978-0973714722
100% (28)
Seeds of Destruction: The Hidden Agenda of Genetic Manipulation. ISBN 0973714727, 978-0973714722
23 pages
Policy & Politics in Nursing and Health Care, 6th Edition. ISBN 1437714161, 978-1437714166
100% (26)
Policy & Politics in Nursing and Health Care, 6th Edition. ISBN 1437714161, 978-1437714166
23 pages
Final ScribdNo Family History: The Environmental Links To Breast Cancer (New Social Formations) - , 978-0742564084
100% (23)
Final ScribdNo Family History: The Environmental Links To Breast Cancer (New Social Formations) - , 978-0742564084
23 pages
Multiscale Modeling of Cancer: An Integrated Experimental and Mathematical Modeling Approach. ISBN 1107013410, 978-0521884426
100% (25)
Multiscale Modeling of Cancer: An Integrated Experimental and Mathematical Modeling Approach. ISBN 1107013410, 978-0521884426
23 pages
Culture in Clinical Care: Strategies For Competence. ISBN 161711040X, 978-1617110405
100% (28)
Culture in Clinical Care: Strategies For Competence. ISBN 161711040X, 978-1617110405
23 pages
Operater Overloading B
No ratings yet
Operater Overloading B
17 pages
Unlocking Sustained Innovation Success in Healthcare. ISBN 1482239809, 978-1482239805
100% (24)
Unlocking Sustained Innovation Success in Healthcare. ISBN 1482239809, 978-1482239805
23 pages
Healthcare Delivery in The U.S.A.: An Introduction, Second Edition. ISBN 1439877947, 978-1439877944
100% (21)
Healthcare Delivery in The U.S.A.: An Introduction, Second Edition. ISBN 1439877947, 978-1439877944
23 pages
Mad Science: Psychiatric Coercion, Diagnosis, and Drugs. 1st Edition. ISBN 1412855926, 978-1412855921
100% (27)
Mad Science: Psychiatric Coercion, Diagnosis, and Drugs. 1st Edition. ISBN 1412855926, 978-1412855921
23 pages
Working On Health Communication. ISBN 1847879233, 978-1847879233
100% (23)
Working On Health Communication. ISBN 1847879233, 978-1847879233
23 pages
The Lupus Encyclopedia: A Comprehensive Guide For Patients and Families (A Johns Hopkins Press Health Book) - ISBN 1421409844, 978-1421409849
100% (32)
The Lupus Encyclopedia: A Comprehensive Guide For Patients and Families (A Johns Hopkins Press Health Book) - ISBN 1421409844, 978-1421409849
23 pages
Sizing of Fuel Cell - Ultracapacitors Hybrid Electric Vehicles Based On The Energy Management Strategy
No ratings yet
Sizing of Fuel Cell - Ultracapacitors Hybrid Electric Vehicles Based On The Energy Management Strategy
5 pages
The Great American Drug Deal: A New Prescription For Innovative and Affordable Medicines. ISBN 1733058915, 978-1733058919
100% (25)
The Great American Drug Deal: A New Prescription For Innovative and Affordable Medicines. ISBN 1733058915, 978-1733058919
23 pages
Leadership in Healthcare: Delivering Organisational Transformation and Operational Excellence (Organizational Behaviour in Healthcare)
100% (26)
Leadership in Healthcare: Delivering Organisational Transformation and Operational Excellence (Organizational Behaviour in Healthcare)
23 pages
Interpreting Health Benefits and Risks: A Practical Guide To Facilitate Doctor-Patient Communication., 978-3319115436
100% (27)
Interpreting Health Benefits and Risks: A Practical Guide To Facilitate Doctor-Patient Communication., 978-3319115436
23 pages
CE 318 Structure Analysis and Design Ii Lab
No ratings yet
CE 318 Structure Analysis and Design Ii Lab
12 pages
Fascia: What It Is and Why It Matters. ISBN 1909141550, 978-1909141551
100% (27)
Fascia: What It Is and Why It Matters. ISBN 1909141550, 978-1909141551
23 pages
Safer Healthcare: Strategies For The Real World., 978-3319255576
100% (31)
Safer Healthcare: Strategies For The Real World., 978-3319255576
23 pages
Leadership and Management in Nursing. ISBN 0135138671, 978-0135138670
100% (26)
Leadership and Management in Nursing. ISBN 0135138671, 978-0135138670
23 pages
MP Medical Terminology: Language For Health Care W/student CD-ROMs and Audio CDs. ISBN 0077302346, 978-0077302344
100% (31)
MP Medical Terminology: Language For Health Care W/student CD-ROMs and Audio CDs. ISBN 0077302346, 978-0077302344
23 pages
The Retail Revolution in Health Care. ISBN 0313366233, 978-0313366239
100% (17)
The Retail Revolution in Health Care. ISBN 0313366233, 978-0313366239
23 pages
Cardiovascular Health Care Economics (Contemporary Cardiology) - ISBN 9780896038745, 978-0896038745
100% (33)
Cardiovascular Health Care Economics (Contemporary Cardiology) - ISBN 9780896038745, 978-0896038745
23 pages
The Developing Human. ISBN 0323313388, 978-0323313384
100% (33)
The Developing Human. ISBN 0323313388, 978-0323313384
23 pages
Final ScribdMolecular and Functional Models in Neuropsychiatry (Current Topics in Behavioral Neurosciences Book 7) - , 978-3642267673
100% (27)
Final ScribdMolecular and Functional Models in Neuropsychiatry (Current Topics in Behavioral Neurosciences Book 7) - , 978-3642267673
23 pages
COVID-19 and World Order: The Future of Conflict, Competition, and Cooperation. ISBN 1421440733, 978-1421440736
100% (22)
COVID-19 and World Order: The Future of Conflict, Competition, and Cooperation. ISBN 1421440733, 978-1421440736
23 pages
Q. 1-Q.30 Carry One Mark Each: India's No.1 Institute For GATE Chemical Engineering CH-1
No ratings yet
Q. 1-Q.30 Carry One Mark Each: India's No.1 Institute For GATE Chemical Engineering CH-1
29 pages
Ethics, Computing, and Medicine: Informatics and The Transformation of Health Care. ISBN 0521469058, 978-0521469050
100% (20)
Ethics, Computing, and Medicine: Informatics and The Transformation of Health Care. ISBN 0521469058, 978-0521469050
23 pages
Statistical Modeling For Biomedical Researchers: A Simple Introduction To The Analysis of Complex Data (Cambridge Medicine (Paperback) )
100% (27)
Statistical Modeling For Biomedical Researchers: A Simple Introduction To The Analysis of Complex Data (Cambridge Medicine (Paperback) )
23 pages
Multivariable Analysis: A Practical Guide For Clinicians and Public Health Researchers. 3rd Edition. ISBN 0521141079, 978-0521141079
100% (21)
Multivariable Analysis: A Practical Guide For Clinicians and Public Health Researchers. 3rd Edition. ISBN 0521141079, 978-0521141079
23 pages
Still Not Safe: Patient Safety and The Middle-Managing of American Medicine. ISBN 0190271264, 978-0190271268
100% (21)
Still Not Safe: Patient Safety and The Middle-Managing of American Medicine. ISBN 0190271264, 978-0190271268
23 pages
Nursing Care Plans & Documentation: Nursing Diagnoses and Collaborative Problems (Nursing Care Plans and Documentation)
100% (31)
Nursing Care Plans & Documentation: Nursing Diagnoses and Collaborative Problems (Nursing Care Plans and Documentation)
23 pages
Study Guide For Structure & Function of The Body. ISBN 0323077234, 978-0323077231
100% (29)
Study Guide For Structure & Function of The Body. ISBN 0323077234, 978-0323077231
23 pages
The Social Economics of Health Care (Routledge Advances in Social Economics) - , 978-0415207652
100% (33)
The Social Economics of Health Care (Routledge Advances in Social Economics) - , 978-0415207652
23 pages
1.AM Methods-1
No ratings yet
1.AM Methods-1
46 pages
Pathology For The Physical Therapist Assistant. ISBN 9780803607866, 978-0803607866
100% (17)
Pathology For The Physical Therapist Assistant. ISBN 9780803607866, 978-0803607866
23 pages
Junqueira's Basic Histology: Text and Atlas. ISBN 1259250989, 978-0071842709
100% (32)
Junqueira's Basic Histology: Text and Atlas. ISBN 1259250989, 978-0071842709
23 pages
Accident and Emergency Radiology. 2nd Edition. ISBN 0702026670, 978-0702026676
100% (30)
Accident and Emergency Radiology. 2nd Edition. ISBN 0702026670, 978-0702026676
23 pages
Adnan Aslam Noon: Present Address Permanent Address
No ratings yet
Adnan Aslam Noon: Present Address Permanent Address
3 pages
Reinventing Clinical Decision Support: Data Analytics, Artificial Intelligence, and Diagnostic Reasoning (HIMSS Book Series)
100% (24)
Reinventing Clinical Decision Support: Data Analytics, Artificial Intelligence, and Diagnostic Reasoning (HIMSS Book Series)
23 pages
The Practical Guide To HIPAA Privacy and Security Compliance. ISBN 1439855587, 978-1439855584
100% (16)
The Practical Guide To HIPAA Privacy and Security Compliance. ISBN 1439855587, 978-1439855584
23 pages
Biochemistry. ISBN 1429229365, 978-1429229364
100% (24)
Biochemistry. ISBN 1429229365, 978-1429229364
23 pages
The 10th House and Lord Me
No ratings yet
The 10th House and Lord Me
6 pages
Quant Developers' Tools and Techniques: Quant Books, #2
From Everand
Quant Developers' Tools and Techniques: Quant Books, #2
Manfred Hindering
No ratings yet
Lumber Tycoon 2 Roblox
No ratings yet
Lumber Tycoon 2 Roblox
6 pages
Microbiology: A Human Perspective. 6th Edition. ISBN 0072995432, 978-0072995435
100% (30)
Microbiology: A Human Perspective. 6th Edition. ISBN 0072995432, 978-0072995435
23 pages
Modeling Count Data (Joseph M. Hilbe)
No ratings yet
Modeling Count Data (Joseph M. Hilbe)
304 pages
Introduction To Genetic Analysis. ISBN 1429229438, 978-1429229432
100% (24)
Introduction To Genetic Analysis. ISBN 1429229438, 978-1429229432
23 pages
Nuclear Medicine Technology: Procedures and Quick Reference. ISBN 0781774500, 978-0781774505
100% (22)
Nuclear Medicine Technology: Procedures and Quick Reference. ISBN 0781774500, 978-0781774505
23 pages
Farr's Physics For Medical Imaging. ISBN 0702028444, 978-0702028441
100% (17)
Farr's Physics For Medical Imaging. ISBN 0702028444, 978-0702028441
23 pages
JR RSM Two - Year PDF
No ratings yet
JR RSM Two - Year PDF
4 pages
Single Replacement Reactions Lab
No ratings yet
Single Replacement Reactions Lab
2 pages
Python in One Shot
No ratings yet
Python in One Shot
10 pages
Measuring Protein Concentration Using Absorbance at 280 NM PDF
No ratings yet
Measuring Protein Concentration Using Absorbance at 280 NM PDF
3 pages
G10pretest Posttest
100% (1)
G10pretest Posttest
3 pages
Modeling
100% (1)
Modeling
300 pages

Modeling Count Data. ISBN 1107611253, 978-1107611252

Uploaded by

Modeling Count Data. ISBN 1107611253, 978-1107611252

Uploaded by

Modeling Count Data

Generalized Linear Models and Extensions (2001, 2007, 2013 – with

Cambridge University Press is part of the University of Cambridge.

This publication is in copyright. Subject to statutory exception

First published 2014

Some Points of Discussion 1

Some Points of Discussion 35

2.3 Constructing a “True” Poisson Model 41

Some Points of Discussion 74

Some Points of Discussion 108

4.2.1 Standard Likelihood Ratio Test 112

Some Points of Discussion 126

Some Points of Discussion 162

Some Points of Discussion 172

Some Points of Discussion 210

Types of Data and Problems Dealt with in This Chapter 217

9.3 Counts with Multiple Components – Finite Mixture Models 232

Appendix: SAS Code 255

r How to recognize the characteristics of count data

on the problem of dealing with both over- and underdispersion, as well

model. I recommend it for those who wish to go beyond the discussions

address: http://works.bepress.com/joseph hilbe/. The book’s page with Cam-

spurred my interest in the negative binomial model. I am fortunate to have

Varieties of Count Data

SOME POINTS OF DISCUSSION

1.1 WHAT ARE COUNTS?

to observations made about events or items that are enumerated. In statistics,

1. A count or enumeration of events

unclear, I am confident that any problem areas regarding the assessment of

1.2 UNDERSTANDING A STATISTICAL COUNT MODEL

1.2.1 Basic Structure of a Linear Statistical Model

Statistics may be generically understood as the science of collecting and ana-

where Y is called the response, outcome, dependent, or sometimes just the y

well-used and basic probability distribution in statistics. ε is also referred to

with ␤X being understood as the summation of the various terms, including

Let’s look at example data (smoking). Suppose that we have a six-

The intercept is defined correctly; check by displaying it. The value is

whereas _b[_cons] is the constant slope of the intercept as given in the

Predicted values may be obtained by

⬎ xb ⬍- cof[1] + cof[2]*male + cof[3]*smoker + cof[4]*age

1.2.2 Models and Probability

You might also like

⬎ xb ⬍- cof[1] + cof[2]male + cof[3]smoker + cof[4]*age