0% found this document useful (0 votes)

18 views42 pages

MSD Discrete Count Models 2

The document discusses various types of discrete data and their corresponding distributions, including Bernoulli, Binomial, Hypergeometric, Poisson, Negative-Binomial, and Multinomial distributions. It also covers sampling schemes, maximum likelihood estimation, goodness-of-fit tests, and model diagnostics, emphasizing the importance of understanding overdispersion in discrete data analysis. Key statistical concepts such as likelihood functions, test statistics, and residuals are explained to assess model fit and variability in data.

Uploaded by

BIPUL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views42 pages

MSD Discrete Count Models 2

Uploaded by

BIPUL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Discrete and count models

Suryakant Yadav, IIPS, Mumbai

Types of Discrete data
• Nominal (e.g., gender, ethnic background, religious or political
affiliation)
• Ordinal (e.g., extent of agreement, school letter grades)
• Quantitative variables with relatively few values (e.g., number of
times married)
• Did you get the flu? (Yes or No) -- is a binary nominal categorical
variable
• What was the severity of your flu? (Low, Medium, or High) -- is an
ordinal categorical variable
Nominal variables
Discrete Distributions
• Bernoulli distribution
𝜋 𝑓𝑜𝑟 𝑥 = 1
𝑓 𝑥 = 1−𝜋 𝑓𝑜𝑟 𝑥 = 0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑓 𝑥 = 𝜋 (1 − 𝜋) 𝑓𝑜𝑟 𝑥 = 0, 1

𝐸 𝑋 =1∗ 𝜋 +0 1−𝑥 =𝜋

𝑉 𝑋 =𝐸 𝑥 − 𝐸 𝑋 = 𝜋(1 − 𝜋)
Binomial Distribution
Suppose that X1,X2,…,Xn are independent and identically distributed
(iid) Bernoulli random variables, each having the distribution

𝑓 𝑥 = 𝜋 (1 − 𝜋) 𝑓𝑜𝑟 𝑥 = 0, 1 𝑎𝑛𝑑 0 ≤ 𝜋 ≤ 1
Let 𝑋 = 𝑋 + ⋯ + 𝑋 then 𝑋~𝐵𝑖𝑛(𝑛, 𝜋)
The binomial distribution has PMF
𝑛!
𝑓 𝑥 = 𝜋 (1 − 𝜋)
𝑥! 𝑛 − 𝑥 !
𝑓𝑜𝑟 𝑥 = 0, 1, 2, 3, … 𝑛, 𝑎𝑛𝑑 0 ≤ 𝜋 ≤ 1
𝐸 𝑋 = 𝑛𝜋 𝑉 𝑋 = 𝑛𝜋(1 − 𝜋)
Hypergeometric distribution
• Suppose there's a population of n objects with 𝑛 of type 1
(success) and 𝑛 = 𝑛 − 𝑛 of type 2 (failure), and m (less than n)
objects are sampled without replacement from this population. Then,
the number of successes X among the sample is a hypergeometric
random variable with PMF
𝑓 𝑥 = , 𝑥 ∈ [max 0, 𝑚 − 𝑛 ; min(𝑛 , 𝑚)]
( )
𝐸 𝑋 = and 𝑉𝑎𝑟 𝑋 =
( )
Poisson distribution
• The PMF of a Poisson distribution is given by
𝜆 𝑒
𝑓 𝑥 =𝑃 𝑋=𝑥 = , 𝑥 = 0, 1,2, … . , 𝑎𝑛𝑑 𝜆 > 0
𝑥!
• Poisson is also the limiting case of the binomial.
Suppose that X∼Bin (n,π) and let n→∞ and π→0 in such a way
that nπ→λ where λ is a constant.
Then, in the limit, X∼Poisson(λ).
• That is, if n is large and π is small, then
!
• 𝜋 (1 − 𝜋) ≈ ; where λ=nπ.
! ! !
• Another interesting property of the Poisson distribution is
that E(X)=V(X)=λ.
Negative-Binomial distribution
• The PMF of Negative binomial distribution is :
f(x)= 𝜋 (1 − 𝜋) ; for x = 0,1 … . .

Where Expectation and Variance is given by:

( )
𝐸 𝑋 = = μ and 𝑉 𝑋 = μ + μ
Multinomial distribution

The PMF of a Multinomial distribution is:

!
f (𝑥 , … . , 𝑥 )= 𝜋 𝜋 …… 𝜋 , 𝑥 = (𝑥 , … . , 𝑥 )
! !………….. !

In addition to the mean and variance of 𝑋 , given by

𝐸 𝑋 =n 𝜋 and 𝑉 𝑋 = n 𝜋 (1- 𝜋 )

There is also a covariance between different outcome 𝑋 𝑎𝑛𝑑𝑋 :

Cov(𝑋 , 𝑋 )=-n𝜋 𝜋
Sampling Schemes

The following sampling methods correspond to the distributions

considered:
• Unrestricted sampling (corresponds to Poisson distribution)

• Sampling with fixed total sample size (corresponds to Binomial or

Multinomial distributions)
Poisson Sampling
• Poisson sampling assumes that the random mechanism to generate the
data can be described by a Poisson distribution. It is useful for
modeling counts or events that occur randomly over a fixed period of
time or in a fixed space.
• Let X be the number of goals scored in a professional soccer game. We
may model this as X ∼ Poisson(λ):
•𝑃 𝑋=𝑥 = , 𝑥 = 0, 1,2, … .
!
• The parameter λ represents the expected number of goals in the game
or the long-run average among all possible such games.
The Poisson Model (distribution) Assumptions:
• Independence: Events must be independent (e.g. the number of goals
scored by a team should not make the number of goals scored by
another team more or less likely.)
• Homogeneity: The mean number of goals scored is assumed to be the
same for all teams.
• Time period (or space) must be fixed
• Note: mean and variance of Poisson distribution are the same;
E(X)=Var(X)=λ.
• However, in practice, the observed variance is usually larger than the
theoretical variance and in the case of Poisson, larger than its mean.
Binomial Sampling
• Data are collected on a pre-determined number of units and are then classified
according to two levels of a categorical variable thus a binomial sampling
emerges.
• Binomial distributions are characterized by two parameters:
• n, which is fixed, where n denotes the number of trials or the total sample size
• 𝜋 , which usually denotes a probability of "success".
• Binomial Model (distribution) is based on three assumptions:
• Fixed n: the total number of trials/events, (or total sample size) is fixed.
• Each event has two possible outcomes: referred to as "success" or "failure"
• Independent and Identical Events/Trials:
• Identical trials mean that the probability of success is the same for each trial.
• Independent means that the outcome of one trial does not affect the outcome
of the other.
Multinomial Sampling

• a generalization of Binomial sampling.

• Data are collected on a pre-determined number of individuals or trials and
classified into one of k categorical outcomes.
• Multinomial Model (distribution) assumptions:
• n trials are independent
• parameter vector 𝜋 remains constant from trial to trial.

Note: The assumption is violated when there occurs clustering in the

data.
Maximum Likelihood Estimation

• Maximum Likelihood Estimation (MLE): a statistical method used

to estimate the parameters of a probability distribution by maximizing
the likelihood function.
• Identifies the values of parameters that make the observed data most
probable.
• The likelihood function is essentially the distribution of a random
variable (or joint distribution of all values if a sample of the random
variable is obtained) viewed as a function of the parameter(s).
Bernoulli and Binomial Likelihoods
• Consider a random sample of n Bernoulli random variables, 𝑋 ,…,𝑋 , each with
PMF
• 𝑓 𝑥 = (1 − 𝜋) ; 𝑥 = 0, 1
• The likelihood function is the joint distribution of these sample values, which we
can write by independence
∑
• ℓ(𝜋)= 𝑓(𝑥 ,…, 𝑥 ; 𝜋)= 𝜋 ∑ 𝑖 𝑥 (1 − 𝜋)
• Where ℓ(𝜋) is the probability of observing 𝑋 ,…,𝑋 as a function of 𝜋, and the
maximum likelihood estimate (MLE) of 𝜋 is the value of 𝜋 that maximizes this
probability function.
• Equivalently, L(𝜋)=log ℓ(𝜋) is maximized at the same value and can be used
interchangeably.
The likelihood function for the sample of Bernoulli random variables
depends only on their sum, which we can write as Y= ∑ 𝑖 𝑋 .

Since Y has a binomial distribution with n trials and success probability

𝜋, we can write its log likelihood function as
• L(𝜋)=log ( )𝜋 (1 − 𝜋)
• The only difference between this log likelihood function and that for
the Bernoulli sample is the presence of the binomial coefficient .
• But since that doesn't depend on 𝜋, it has no influence on the MLE
and may be neglected.
Goodness-of-Fit Test
• A goodness-of-fit test, is done to measure how well the observed data
correspond to the fitted (assumed) model.
• Like in linear regression, the goodness-of-fit test compares the
observed values to the expected (fitted or predicted) values.
• A goodness-of-fit statistic tests the following hypothesis:
• 𝐻 : the model 𝑀 fits
• 𝐻 : the model 𝑀 does not fit (or, some other model 𝑀 fits)
• Most often the observed data represent the fit of the saturated
model, the most complex model possible with the given data.
• Example: Consider in a throw of a dice. We want to test the hypothesis
that there is an equal probability of six faces by comparing the
observed frequencies to those expected under the assumed
model: X∼Multi(n=30, 𝜋 ) where 𝜋 =(1/6,1/6,1/6,1/6,1/6,1/6).

• This can be thought of as simultaneously testing that the probability in

each cell is being equal or not to a specified value.

• In this Case, 𝐻 : 𝜋 = 𝜋 ;where the alternative hypothesis is that any

of these elements differ from the null value.
Test Statistics
• Pearson Goodness-of-fit Test Statistic
( )
• The Pearson goodness-of-fit statistic is: 𝑋 = ∑

• Likelihood-ratio Test Statistic: 𝐺 = -2log =-2(𝐿 - 𝐿 )

• Note: 𝑋 and 𝐺 are both functions of the observed data X and a
vector of probabilities π .
Testing the Goodness-of-Fit
• 𝑋 and 𝐺 both measure how closely the model, in this
case Mult(n,π0) "fits" the observed data. And both have an
approximate chi-square distribution with k−1 degrees of freedom
when 𝐻 is true. This allows us to use the chi-square distribution to
find critical values and p-values for establishing statistical
significance.
• If the sample proportions π (i.e., saturated model) are exactly equal to
the model's π for cells j=1,2,…,k, then Oj=Ej for all j, and
both 𝑋 and 𝐺 will be zero. That is, the model fits perfectly.
• If the sample proportions π deviate from the π ’s ,
then both 𝑋 and 𝐺 are both positive. Large values of both 𝑋 and
𝐺 mean that the data do not agree well with the assumed/proposed
model 𝑀 .
Residuals
Pearson Residuals
• Pearson Goodness-of-fit Test Statistic
• The Pearson goodness-of-fit statistic can be written as: 𝑋 = ∑ 𝑟 ,
( )
where 𝑟 = is called the Pearson residual for cell j, and it
compares the observed with the expected counts.
• The sign (positive or negative) indicates whether the observed
frequency in cell j is higher or lower than the value implied under the
null model, and the magnitude indicates the degree of departure.
Deviance Residuals
• Although not as intuitively as the 𝑋 statistic, the deviance statistic
𝐺 = ∑ 𝑑 can be regarded as the sum of squared deviance
residuals.

• |2𝑋 log | *sign (𝑋 - 𝑛𝜋 𝑗)

• Where sign function can take three values:
• -1 if (𝑋 - 𝑛𝜋 𝑗) <0,
• 0 if (𝑋 - 𝑛𝜋 𝑗) =0, or
• 1 if (𝑋 - 𝑛𝜋 𝑗) >0.
Model Diagnostics
The goodness-of-fit statistics tell us how well a particular model fits the
data, they don't tell us much about why a model may fit poorly. To
assess the lack of fit we need to look at regression diagnostics.
The standard linear regression model is given by:
𝑦 ∼N(μ , σ )
μ=𝑥 β
The two crucial features of this model are
1.the assumed mean structure, μ = 𝑥 β, and
2.the assumed constant variance σ (homoscedasticity).
• The most common diagnostic tool is the residuals, the
difference between the estimated and observed values of the
dependent variable.

• The most common way to check these assumptions is to fit the

model and then plot the residuals versus the fitted values ŷ =
𝑥 β.
Overdispersion
• Overdispersion is an important concept in the analysis of discrete data.
Many times, data admit more variability than expected under the
assumed distribution. The extra variability not predicted by the
generalized linear model random component reflects overdispersion.
• Overdispersion occurs because the mean and variance components of
a GLM are related and depend on the same parameter that is being
predicted through the predictor set.
• Overdispersion is not an issue in ordinary linear regression.
• In a linear regression model 𝑦 ∼N(𝑥 β, σ ) the variance σ is
estimated independently of the mean function 𝑥 β. With discrete
response variables, the possibility for overdispersion exists because
the commonly used distributions specify particular relationships
between the variance and the mean.
• In the context of logistic regression, overdispersion occurs when the
discrepancies between the observed responses 𝑦 and their predicted
values ȗ = 𝑛 π are larger than what the binomial model would
predict.
• Overdispersion arises when the 𝑛 Bernoulli trials that are summarized
in a line of the dataset are
• not identically distributed (i.e., the success probabilities vary from
one trial to the next), or
• not independent (i.e., the outcome of one trial influences the
outcomes of other trials).
• In practice, it is impossible to distinguish non-identically distributed
trials from non-independence.
Adjusting for Overdispersion
• Adjusting for overdispersion comes from the theory of quasi-
likelihood.
• Quasilikelihood has come to play a very important role in modern
statistics.
• e.g. Generalized Estimating Equations (GEE) for longitudinal data)
because they do not require the specification of a full parametric
model.
• In the quasilikelihood approach, we must first specify the "mean
function" which determines how μ = 𝐸 Y is related to the covariates.
• In the context of logistic regression, the mean function is:
μ = n exp(𝑥 β)
π
which implies, log = 𝑥 β.
π
Note:
• we must specify the "variance function," which determines the
relationship between the variance of the response variable and its
mean.
• There is no overdispersion for ungrouped data.
• overdispersion is not possible if 𝑛 =1.
• If𝑦 only takes values 0 and 1, then it must be distributed as Bernoulli
(π), and its variance must be π (1− π ).
• There is no other distribution with support {0,1}. Therefore, with
ungrouped data, we should always assume scale=1 and not try to
estimate a scale parameter and adjust for overdispersion.
Receiver Operating Characteristic Curve (ROC)
• A Receiver Operating Characteristic Curve (ROC) is a standard
technique for summarizing classifier performance over a range of trade-
offs between true positive (TP) and false positive (FP) error rates.

• ROC curve is a plot of sensitivity (the ability of the model to predict an

event correctly) versus (1-specificity) for the possible cut-off
classification probability values π .
• For logistic regression we can create a 2×2 classification table of
predicted values from model for the response if 𝑦^ =0 or 1 versus the
true value of 𝑦 =0 or 1.
• The prediction if 𝑦^ =1 depends on some cut-off probability, π .
• For example, 𝑦^ =1 if π^i> π and 𝑦^ =0 if π^i≤ π .

• The most common value for π =0.5. Then sensitivity=P(𝑦^ =1|

𝑦 =1) and specificity=P(y^=0| 𝑦 =0).
• The ROC curve is more informative than the classification table since
it summarizes the predictive power for all possible π .
• The position of the ROC on the graph reflects the accuracy of the
diagnostic test. It covers all possible thresholds (cut-off points).
• The ROC of random guessing lies on the diagonal line.
• The ROC of a perfect diagnostic technique is a point at the upper left
corner of the graph, where the TP proportion is 1.0 and the FP
proportion is 0.
• The Area Under the Curve (AUC), also referred to as index of
accuracy (A), or concordance index, c, and it is an accepted traditional
performance metric for a ROC curve.
• The higher the area under the curve
the better prediction power the
model has. c=0.8 can be interpreted
to mean that a randomly selected
individual from the positive group
has a test value larger than that for
a randomly chosen individual from
the negative group 80 percent of the
time.
• Here Area under the curve
is c=0.746 indicates good
predictive power of the model.
Generalized Linear Model
• The term "general" linear model (GLM) usually refers to conventional
linear regression models for a continuous response variable given
continuous and/or categorical predictors. It includes multiple linear
regression, as well as ANOVA and ANCOVA (with fixed effects only).
The form is yi∼N(xiTβ,σ2), where xi contains known covariates
and β contains the coefficients to be estimated. These models are fit by
least squares and weighted least squares.
GLM
• The term "generalized" linear model (GLIM or GLM) refers to a larger
class of models popularized by McCullagh and Nelder (1982, 2nd
edition 1989). In these models, the response variable yi is assumed to
follow an exponential family distribution with mean μi, which is
assumed to be some (often nonlinear) function of xiTβ. Some would
call these “nonlinear” because μi is often a nonlinear function of the
covariates, but McCullagh and Nelder consider them to be linear
because the covariates affect the distribution of yi only through the
linear combination xiTβ.
GLM
• There are three components to any GLM:
• Random Component - specifies the probability distribution of the response
variable; e.g., normal distribution for Y in the classical regression model, or
binomial distribution for Y in the binary logistic regression model. This is the only
random component in the model; there is not a separate error term.
• Systematic Component - specifies the explanatory variables (x1,x2,…,xk) in the
model, more specifically, their linear combination; e.g., β0+β1x1+β2x2, as we
have seen in a linear regression and the logistic regression.
• Link Function, η or g(μ) - specifies the link between the random and the
systematic components. It indicates how the expected value of the response relates
to the linear combination of explanatory variables; e.g., η=g(E(Yi))=E(Yi) for
classical regression, or η=log( )=logit(π) for logistic regression.
GLM
• Assumptions
• The data Y1,Y2,…,Yn are independently distributed, i.e., cases are independent.
• The dependent variable Yi does NOT need to be normally distributed, but it typically assumes a
distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal, etc.).
• A GLM does NOT assume a linear relationship between the response variable and the explanatory
variables, but it does assume a linear relationship between the transformed expected response in
terms of the link function and the explanatory variables; e.g., for binary logistic
regression logit(π)=β0+β1x.
• Explanatory variables can be nonlinear transformations of some original variables.
• The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in many
cases given the model structure.
• Errors need to be independent but NOT normally distributed.
• Parameter estimation uses maximum likelihood estimation (MLE) rather than ordinary least
squares (OLS).
•
Diagnostic analysis
• predict [type] newvarname [if exp] [in range] [, statistic]
• where statistic is
• xb fitted values; the default
• pr(a,b) Pr(y |a>y>b) (a and b may be numbers
• e(a,b) E(y |a>y>b) or variables; a==. means
• ystar(a,b) E(y*) -inf; b==. means inf)
• cooksd Cook's distance
• leverage | hat leverage (diagonal elements of hat matrix)
• residuals residuals
• rstandard standardized residuals
• rstudent Studentized (jackknifed) residuals
• stdp standard error of the prediction
• stdf standard error of the forecast
• stdr standard error of the residual
• (*) covratio COVRATIO
• (*) dfbeta(varname) DFBETA for varname
• (*) dfits DFITS
• (*) welsch Welsch distance
Diagnostic Test
• Detecting Unusual and Influential Data
• predict — used to create predicted values, residuals, and measures of
influence.
• rvpplot — graphs a residual-versus-predictor plot.
• rvfplot — graphs residual-versus-fitted plot.
• lvr2plot — graphs a leverage-versus-squared-residual plot.
• dfbeta — calculates DFBETAs for all the independent variables in the linear
model.
• avplot — graphs an added-variable plot, a.k.a. partial regression plot.
Diagnostic Test
• Tests for Normality of Residuals
• kdensity — produces kernel density plot with normal distribution overlayed.
• pnorm — graphs a standardized normal probability (P-P) plot.
• qnorm — plots the quantiles of varname against the quantiles of a normal
distribution.
• iqr — resistant normality check and outlier identification.
• swilk — performs the Shapiro-Wilk W test for normality.
Diagnostic Test
• Tests for Heteroscedasticity
• rvfplot — graphs residual-versus-fitted plot.
• hettest — performs Cook and Weisberg test for heteroscedasticity.
• whitetst — computes the White general test for Heteroscedasticity.
• Tests for Multicollinearity
• vif — calculates the variance inflation factor for the independent variables in
the linear model.
• collin — calculates the variance inflation factor and other multicollinearity
diagnostics
Diagnostic Test
• Tests for Non-Linearity
• acprplot — graphs an augmented component-plus-residual plot.
• cprplot — graphs component-plus-residual plot, a.k.a. residual plot.
• Tests for Model Specification
• linktest — performs a link test for model specification.
• ovtest — performs regression specification error test (RESET) for omitted
variables.

3 - Introduction To Inferential Statistics
No ratings yet
3 - Introduction To Inferential Statistics
32 pages
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
100% (1)
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
578 pages
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
100% (2)
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
16 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Chapter 3 Radiation
100% (1)
Chapter 3 Radiation
36 pages
Probability Distributions
No ratings yet
Probability Distributions
63 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Week 10 - Statistics, Random Sampling, Point Estimation
No ratings yet
Week 10 - Statistics, Random Sampling, Point Estimation
14 pages
Probability Tutorial
No ratings yet
Probability Tutorial
8 pages
Lecture Note On Biostatistics
No ratings yet
Lecture Note On Biostatistics
74 pages
ProbabilityDistributions BRSM SP2022 Lecture3
No ratings yet
ProbabilityDistributions BRSM SP2022 Lecture3
45 pages
DMV - Unit I
No ratings yet
DMV - Unit I
44 pages
Mood Introduction To The Theory of Statistics
0% (1)
Mood Introduction To The Theory of Statistics
577 pages
AML - Unit - 2
No ratings yet
AML - Unit - 2
29 pages
Categorical Chapter One
No ratings yet
Categorical Chapter One
11 pages
STAT2120: Categorical Data Analysis Chapter 1: Introduction
No ratings yet
STAT2120: Categorical Data Analysis Chapter 1: Introduction
51 pages
Distribution Theory Questionnaire
No ratings yet
Distribution Theory Questionnaire
3 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
Discrete Distribution
No ratings yet
Discrete Distribution
19 pages
Lec 01
No ratings yet
Lec 01
44 pages
Random Variables
No ratings yet
Random Variables
7 pages
Module 3 - Assignment Rakesh Thakor
No ratings yet
Module 3 - Assignment Rakesh Thakor
16 pages
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
No ratings yet
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
577 pages
Statistics Theory (Soyaib)
No ratings yet
Statistics Theory (Soyaib)
13 pages
P299 Module 8 Notes
No ratings yet
P299 Module 8 Notes
8 pages
Stats
No ratings yet
Stats
24 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
ST2187 - Block 6 Common Probability Distributions in Business Applications
No ratings yet
ST2187 - Block 6 Common Probability Distributions in Business Applications
15 pages
PSM 2k23
No ratings yet
PSM 2k23
32 pages
Binomial Distribution
No ratings yet
Binomial Distribution
3 pages
Unit 4
No ratings yet
Unit 4
30 pages
QT Module IV
No ratings yet
QT Module IV
8 pages
Eco253 Summary 08024665051
No ratings yet
Eco253 Summary 08024665051
8 pages
Theoretical Distributions
No ratings yet
Theoretical Distributions
5 pages
Stat1 Formulas and Tables For Statistics 2022
No ratings yet
Stat1 Formulas and Tables For Statistics 2022
34 pages
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
No ratings yet
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
41 pages
Distributions
No ratings yet
Distributions
21 pages
Reading Material Mod 3 Statistical Methods
No ratings yet
Reading Material Mod 3 Statistical Methods
15 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
7 pages
STAE Lecture Notes - LU5
No ratings yet
STAE Lecture Notes - LU5
22 pages
Assignment 1: MSC Statistics
No ratings yet
Assignment 1: MSC Statistics
5 pages
Random Variables Probability Mass Function: Mathematical Expectation Mathematical Expectation
No ratings yet
Random Variables Probability Mass Function: Mathematical Expectation Mathematical Expectation
6 pages
B128 Expt9 Sem 2
No ratings yet
B128 Expt9 Sem 2
8 pages
STAT 330 Supplementary Notes
No ratings yet
STAT 330 Supplementary Notes
134 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Statistics Final Review
No ratings yet
Statistics Final Review
28 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
Statistics For Business Topic - Chapter 6 - Discrete Probability Distributions
No ratings yet
Statistics For Business Topic - Chapter 6 - Discrete Probability Distributions
1 page
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Module Stat
No ratings yet
Module Stat
56 pages
Outscraper-2023110612255569c3 Pathology +1
No ratings yet
Outscraper-2023110612255569c3 Pathology +1
590 pages
Theoretical Distributions 1
No ratings yet
Theoretical Distributions 1
2 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
Erwin John Landicho
No ratings yet
Erwin John Landicho
8 pages
Practical Research 1 - Quarter 1 - Module 1 - Nature and Inquiry of Research
No ratings yet
Practical Research 1 - Quarter 1 - Module 1 - Nature and Inquiry of Research
55 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
Types of Random Variables
No ratings yet
Types of Random Variables
4 pages
PR2 Template
No ratings yet
PR2 Template
10 pages
CQA Practical Manual 2018 - V5 PDF
No ratings yet
CQA Practical Manual 2018 - V5 PDF
30 pages
Philosophical Assumptions Underlying Qualitative As Opposed To Quantitative Research
50% (4)
Philosophical Assumptions Underlying Qualitative As Opposed To Quantitative Research
4 pages
Practical Research 1 - 4th Quarter LAS
80% (5)
Practical Research 1 - 4th Quarter LAS
39 pages
Be Ssssss Sssssssss
No ratings yet
Be Ssssss Sssssssss
12 pages
HookPoint Deck
No ratings yet
HookPoint Deck
57 pages
Chapter 10: Nature of Research Design and Methods
No ratings yet
Chapter 10: Nature of Research Design and Methods
10 pages
Thesis Template v2.0-1
No ratings yet
Thesis Template v2.0-1
33 pages
05 - Basic Concepts of Research
No ratings yet
05 - Basic Concepts of Research
7 pages
BUSI1319 Proposal
No ratings yet
BUSI1319 Proposal
16 pages
AOC FOC Complete
No ratings yet
AOC FOC Complete
26 pages
Systems For Production Have Existed Since Ancient Times
No ratings yet
Systems For Production Have Existed Since Ancient Times
30 pages
Stat 235 Lab Assignment 2
No ratings yet
Stat 235 Lab Assignment 2
13 pages
Business Statistics
No ratings yet
Business Statistics
10 pages
3 Forensic Science Law
No ratings yet
3 Forensic Science Law
38 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
2 pages
MOD 5 (Hypotheis Testing)
No ratings yet
MOD 5 (Hypotheis Testing)
40 pages
Kualitas Pelayanan Jasa Keagenan Kapal Dan Komunik
No ratings yet
Kualitas Pelayanan Jasa Keagenan Kapal Dan Komunik
9 pages
2021 M.A. Yogashastra Sem III Paper III Research Methodology
No ratings yet
2021 M.A. Yogashastra Sem III Paper III Research Methodology
4 pages
Assignment 3 EPI3010
No ratings yet
Assignment 3 EPI3010
5 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
21 pages
2189-Article Text-8176-1-10-20230622
No ratings yet
2189-Article Text-8176-1-10-20230622
7 pages
Probability Discrete
No ratings yet
Probability Discrete
4 pages
Answers (Chapter 8)
No ratings yet
Answers (Chapter 8)
8 pages
SDS MicroMasters Program SDS Schedule 202122
No ratings yet
SDS MicroMasters Program SDS Schedule 202122
2 pages
What Is Research, Difference Between Qualitative and Quantitative Research, Salient Features of A Good Researcher
No ratings yet
What Is Research, Difference Between Qualitative and Quantitative Research, Salient Features of A Good Researcher
5 pages
Least-Squares Methods For System Identification
No ratings yet
Least-Squares Methods For System Identification
4 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

MSD Discrete Count Models 2

Uploaded by

MSD Discrete Count Models 2

Uploaded by

Discrete and count models

Suryakant Yadav, IIPS, Mumbai

Where Expectation and Variance is given by:

The PMF of a Multinomial distribution is:

In addition to the mean and variance of 𝑋 , given by

There is also a covariance between different outcome 𝑋 𝑎𝑛𝑑𝑋 :

The following sampling methods correspond to the distributions

• Sampling with fixed total sample size (corresponds to Binomial or

• a generalization of Binomial sampling.

Note: The assumption is violated when there occurs clustering in the

• Maximum Likelihood Estimation (MLE): a statistical method used

Since Y has a binomial distribution with n trials and success probability

• This can be thought of as simultaneously testing that the probability in

• In this Case, 𝐻 : 𝜋 = 𝜋 ;where the alternative hypothesis is that any

• Likelihood-ratio Test Statistic: 𝐺 = -2log =-2(𝐿 - 𝐿 )

• |2𝑋 log | *sign (𝑋 - 𝑛𝜋 𝑗)

• The most common way to check these assumptions is to fit the

• ROC curve is a plot of sensitivity (the ability of the model to predict an

• The most common value for π =0.5. Then sensitivity=P(𝑦^ =1|

You might also like