Vignette Mvord PDF
Vignette Mvord PDF
Abstract
The R package mvord implements composite likelihood estimation in the class of mul-
tivariate ordinal regression models with a multivariate probit and a multivariate logit link.
A flexible modeling framework for multiple ordinal measurements on the same subject is
set up, which takes into consideration the dependence among the multiple observations
by employing different error structures. Heterogeneity in the error structure across the
subjects can be accounted for by the package, which allows for covariate dependent er-
ror structures. In addition, different regression coefficients and threshold parameters for
each response are supported. If a reduction of the parameter space is desired, constraints
on threshold as well as on the regression coefficients can be specified by the user. The
proposed multivariate framework is illustrated by means of a credit risk application.
Keywords: composite likelihood estimation, correlated ordinal data, multivariate ordinal logit
regression model, multivariate ordinal probit regression model, R.
1. Introduction
The analysis of ordinal data is an important task in various areas of research. One of the most
common settings is the modeling of preferences or opinions (on a scale from, say, poor to very
good or strongly disagree to strongly agree). The scenarios involved range from psychology
(e.g., aptitude and personality testing), marketing (e.g., consumer preferences research) and
economics and finance (e.g., credit risk assessment for sovereigns or firms) to information
retrieval (where documents are ranked by the user according to their relevance) and medical
sciences (e.g., modeling of pain severity or cancer stages).
Most of these applications deal with correlated ordinal data, as typically multiple ordinal
measurements or outcomes are available for a collection of subjects or objects (e.g., inter-
viewees answering different questions, different raters assigning credit ratings to a firm, pain
levels being recorded for patients repeatedly over a period of time, etc.). In such a multi-
variate setting, models which are able to deal with the correlation in the ordinal outcomes
are desired. One possibility is to employ a multivariate ordinal regression model where the
marginal distribution of the subject errors is assumed to be multivariate. Other options are
the inclusion of random effects in the ordinal regression model and conditional models (see
e.g., Fahrmeir and Tutz 2001).
Several ordinal regression models can be employed for the analysis of ordinal data, with
2 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
cumulative link models being the most popular ones (e.g., Tutz 2012; Christensen 2015a).
Other approaches include continuation-ratio or adjacent-category models (e.g., Agresti 2002,
2010). Different packages to analyze and model ordinal data are available in R (R Core Team
2018). For univariate ordinal regression models with fixed effects the function polr() of
the MASS package (Venables and Ripley 2002), the function clm() of the ordinal package
(Christensen 2015b), which supports scale effects as well as nominal effects, and the function
vglm() of the VGAM package (Yee 2010) are available. Another package which accounts for
heteroskedasticity is oglmx (Carroll 2016). Package ordinalNet (Wurm, Rathouz, and Hanlon
2017) offers tools for model selection by using an elastic net penalty, whereas package ordi-
nalgmifs (Archer, Hou, Zhou, Ferber, Layne, and Gentry 2014) performs variable selection
by using the generalized monotone incremental forward stagewise (GMIFS) method. More-
over, ordinal logistic models can be fitted by the functions lms() and orm() in package rms
(Harrell Jr 2017), while ordinal probit models can be fitted by the function MCMCoprobit()
function in package MCMCpack (Martin, Quinn, and Park 2011) which uses Markov Chain
Monte Carlo methods to fit ordinal probit regression models.
An overview on ordinal regression models in other statistical software packages like Stata
(StataCorp. 2018), SAS (SAS Institute Inc. 2018) or SPSS (SPSS Inc. 2018) is provided by Liu
(2009). These software packages include the Stata procedure OLOGIT, the SAS procedure PROC
LOGISTIC and the SPSS procedure PLUM which perform ordinal logistic regression models. The
software procedure PLUM additionally includes other link functions like probit, complementary
log-log, cauchit and negative log-log. Ordinal models for multinomial data are available in
the SAS package PROC GENMOD, while another implementation of ordinal logistic regression is
available in JMP (JMP 2018). In Python (Python Software Foundation 2018), package mord
(Pedregosa-Izquierdo 2015) implements ordinal regression methods.
While there are sufficient software tools in R which deal with the univariate case, the ready-to-
use packages for dealing with the multivariate case fall behind, mainly due to computational
problems or lack of flexibility in the model specification. However, there are some R pack-
ages which support correlated ordinal data. One-dimensional normally distributed random
effects in ordinal regression can be handled by the clmm() function of package ordinal (Chris-
tensen 2015b). Multiple possibly correlated random effects are implemented in package mixor
(Hedeker, Archer, Nordgren, and Gibbons 2015). Note that this package uses multidimen-
sional quadrature methods and estimation becomes infeasible for increasing dimension of the
random effects. Bayesian multilevel models for ordinal data are implemented in package brms
(Bürkner 2017). Multivariate ordinal probit models, where the subject errors are assumed
to follow a multivariate normal distribution with a general correlation matrix, can be esti-
mated with package PLordprob (Kenne Pagui, Canale, Genz, and Azzalini 2014), which uses
maximum composite likelihood methods estimation. This package works well for standard
applications but lacks flexibility. For example, the number of levels of the ordinal responses
needs to be equal across all dimensions, threshold and regression coefficients are the same for
all multiple measurements and it does not account for missing observations in the outcome
variable. Polychoric correlations, which are used to measure association among two ordinal
outcomes, can be estimated by the polychor() function of package polycor (Fox 2016), where
a simple bivariate probit model without covariates is estimated using maximum likelihood es-
timation. None of these packages support at the time of writing covariate dependent error
structures. A package which allows for different error structures in non-linear mixed effects
models is package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2017), even
Rainer Hirk, Kurt Hornik and Laura Vana 3
models. In this model class, each of the ordinal responses is modeled as a categorized version
of an underlying continuous latent variable which is slotted according to some threshold
parameters. On the latent scale we assume a linear model for each of the underlying continuous
variables and the existence of a joint distribution for the corresponding error terms. A common
choice for this joint distribution is the multivariate normal distribution, which corresponds
to the multivariate probit link. We extend the available software in several directions. The
flexible modeling framework allows imposing constraints on threshold as well as regression
coefficients. In addition, various assumptions about the variance-covariance structure of the
errors are supported, by specifying different types of error structures. These include a general
correlation, a general covariance, an equicorrelation and an AR(1) error structure. The
general error structures can depend on a categorical covariate, while in the equicorrelation
and AR(1) structures both numerical and categorical covariates can be employed. Moreover,
in addition to the multivariate probit link, we implement a multivariate logit link for the class
of multivariate ordinal regression models.
This paper is organized as follows: Section 2 provides an overview of the model class and
the estimation procedure, including model specification and identifiability issues. Section 3
presents the main functions of the package. A couple of worked examples are given in Sec-
tion 4. Section 5 concludes.
Yei = β0 + x⊤
i β + ǫi ,
where −∞ ≡ θ0 < θ1 < · · · < θK−1 < θK ≡ ∞ are threshold parameters on the latent
scale (see e.g., Agresti 2010; Tutz 2012). In such a setting the ordinal response variable
Yi follows a multinomial distribution with parameter πi . Let denote by πir the probability
Rainer Hirk, Kurt Hornik and Laura Vana 5
that observation i falls in category r. Then the cumulative link model (McCullagh 1980) is
specified by:
P(Yi ≤ r) = P(β0 + x⊤ ⊤
i β + ǫi ≤ θr ) = F (θr − β0 − xi β) = πi1 + · · · + πir .
Typical choices for the distribution function F are the normal and the logistic distributions.
where rij is a category out of Kj ordered categories and θj is a vector of suitable threshold
parameters for outcome j with the following restriction: −∞ ≡ θj,0 < θj,1 < · · · < θj,Kj −1 <
θj,Kj ≡ ∞. Note that in this setting binary observations can be treated as ordinal observations
with two categories (Kj = 2).
The following linear model is assumed for the relationship between the latent variable Yeij and
the vector of covariates xij :
Yeij = βj0 + x⊤
ij βj + ǫij , (1)
where βj0 is an intercept term, βj = (βj1 , . . . , βjp )⊤ is a vector of regression coefficients, both
corresponding to outcome j. We further assume the n subjects to be independent. Note that
the number of ordered categories Kj as well as the threshold parameters θj and the regression
coefficients βj are allowed to vary across outcome dimensions j ∈ J to account for possible
heterogeneity across the response variables.
Link functions The dependence among the different responses is accounted for by assuming
that, for each subject i, the vector of error terms ǫi = [ǫij ]j∈Ji follows a suitable multivariate
distribution. We consider two multivariate distributions which correspond to the multivari-
ate probit and logit link functions. For the multivariate probit link, we assume that the
6 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
errors follow a multivariate normal distribution: ǫi ∼ Nqi (0, Σi ). A multivariate logit link is
constructed by employing a multivariate logistic distribution family with univariate logistic
margins and a t copula with certain degrees of freedom proposed by O’Brien and Dunson
(2004). For a vector z = (z1 , . . . , zq )⊤ , the multivariate logistic distribution function with ν
degrees of freedom, mean µ and covariance matrix Σ is defined as:
Fν,µ,Σ (z) = tν,R ({gν ((z1 − µ1 )/σ1 ), . . . , gν ((zq − µq )/σq )}⊤ ), (2)
where tν,R is the q dimensional multivariate t distribution with ν degrees of freedom and
correlation matrix R corresponding to Σ, gν (x) = t−1 −1
ν (exp(x)/(exp(x) + 1)), with tν the
2
quantile function of the univariate t distribution with ν degrees of freedom and σ1 , . . . , σq2
the diagonal elements of Σ.
? employed this t copula based multivariate logistic family, while Nooraee, Abegaz, Ormel,
Wit, and van den Heuvel (2016) used a multivariate t distribution with the ν = 8 degrees
of freedom as an approximation for this multivariate logistic distribution. The employed
distribution family differs from the conventional multivariate logistic distributions of Gumbel
(1961) or Malik and Abraham (1973) in that it offers a more flexible dependence structure
through the correlation matrix of the t copula, while still keeping the log odds interpretation
of the regression coefficients through the univariate logistic margins.
• Fixing the intercept βj0 (e.g., to zero), using flexible thresholds θj and fixing σij (e.g.,
to unity) ∀j ∈ Ji , ∀i ∈ {1, . . . , n};
• Leaving the intercept βj0 unrestricted, fixing one threshold parameter (e.g., θj,1 = 0)
and fixing σij (e.g., to unity) ∀j ∈ Ji , ∀i ∈ {1, . . . , n};
• Fixing the intercept βj0 (e.g., to zero), fixing one threshold parameter (e.g., θj,1 = 0)
and leaving σij unrestricted ∀j ∈ Ji , ∀i ∈ {1, . . . , n};
• Leaving the intercept βj0 unrestricted, fixing two threshold parameters (e.g., θj,1 = 0
and θj,2 = 1) and leaving σij unrestricted ∀j ∈ Ji , ∀i ∈ {1, . . . , n}1 .
Note that the first two options are the most commonly used in the literature. All of these
alternative model parameterizations are supported by the mvord package, allowing the user
to choose the most convenient one for each specific application. Table 2 in Section 3.5 gives
an overview on the identifiable parameterizations implemented in the package.
Basic model
The basic multivariate ordinal regression model assumes that the correlation (and possibly
variance, depending on the parameterization) parameters in the distribution function of the
ǫi are constant for all subjects i.
Correlation The dependence between the multiple measurements or outcomes can be cap-
tured by different correlation structures. Among them, we concentrate on the following three:
• The general correlation structure assumes different correlation parameters between pairs
of outcomes corr(ǫik , ǫil ) = ρkl . This error structure is among the most common in the
literature (e.g., Scott and Kanaroglou 2002; Bhat et al. 2010; Kenne Pagui and Canale
2016).
• The equicorrelation structure corr(ǫik , ǫil ) = ρ implies that the correlation between all
pairs of outcomes is constant.
• When faced with longitudinal data, especially when moderate to long subject-specific
time series are available, an AR(1) autoregressive correlation model of order one can
be employed. Given equally spaced time points this AR(1) error structure implies an
exponential decay in the correlation with the lag. If k and l are the time points when
Yik and Yil are observed, then corr(ǫik , ǫil ) = ρ|k−l| .
Variance If a parameterization with identifiable variance is used (see Section 2.3), in the
basic model we assume that for each multiple measurement the variance is constant across
all subjects (VAR(ǫij ) = σj2 ).
Correlation For each subject i and each pair (k, l) from the set Ji , the correlation parameter
ρikl is assumed to depend on a vector si of m subject-specific covariates. In this paper we
use the hyperbolic tangent transformation to reparameterize the linear term α0kl + s⊤ i αkl in
terms of a correlation parameter:
⊤α )
1 1 + ρikl e2(α0kl +si kl −1
log = α0kl + s⊤
i αkl , ρikl = ⊤ .
2 1 − ρikl e2(α0kl +si αkl ) +1
If αkl = 0 for all k, l ∈ Ji , this model would correspond to the general correlation structure
in the basic model. Moreover, if α0kl = 0 and αkl = 0 for all k, l ∈ Ji , the correlation matrix
is the identity matrix and the responses are uncorrelated.
For the more parsimonious error structures of equicorrelation and AR(1), in the extended
model the correlation parameters are modeled as:
⊤ α)
1 1 + ρi e2(α0 +si −1
log = α0 + s⊤
i α, ρi = ⊤ .
2 1 − ρi e2(α0 +si α) +1
8 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
Variance Similarly, one could model the heterogeneity among the subjects through the
2 by employing the following linear model on the log-
variance parameters VAR(ǫij ) = σij
variance:
2
log(σij ) = γ0j + s⊤
i γj .
Note that other suitable link functions for the correlation and variance parameterizations
could also be applied. The positive-semi-definiteness of the correlation (or covariance) matrix
Σi can be ensured by the use of special algorithms such as the one proposed by Higham
(1988).
Q
where Di = j∈Ji (θj,rij −1 , θj,rij ) is a Cartesian product, wi are subject-specific non-negative
weights (which are set to one in the default case) and fi,qi is the qi -dimensional density of
the error terms ǫi . We approximate this full likelihood by a pairwise likelihood which is
constructed from bivariate marginal distributions. If the number of observed outcomes for
subject i is less than two (qi < 2), the univariate marginal distribution enters the likelihood.
The pairwise log-likelihood function is obtained by:
n
X X
pℓ(δ) = wi ✶{qi ≥2} log (P(Yik = rik , Yil = ril )) +
i=1 k<l
k,l∈Ji
✶{qi =1} ✶{k∈Ji } log (P(Yik = rik )) . (3)
Denoting by fi,1 and fi,2 the uni- and bivariate density functions corresponding to the error
distribution, the uni- and bivariate probabilities are given by:
Z θk,r Z θl,r
ik il
P(Yik = rik , Yil = ril ) = fi,2 (Yeik , Yeil ; δ)dYeik dYeil ,
θk,rik −1 θl,ril −1
Z θk,r
ik
P(Yik = rik ) = fi,1 (Yeik ; δ)dYeik .
θk,rik −1
The maximum pairwise likelihood estimates δ̂pℓ are obtained by direct maximization of the
composite likelihood given in Equation 3. The threshold and error structure parameters to
be estimated are reparameterized such that unconstrained optimization can be performed.
Rainer Hirk, Kurt Hornik and Laura Vana 9
where pℓi (δ) is the component of the pairwise log-likelihood corresponding to subject i and
pℓikl (δ) corresponds to subject i and pair (k, l).
In order to compare different models, the composite likelihood information criterion by Varin
and Vidoni (2005) can be used: CLIC(δ) = −2 pℓ(δ̂pℓ ) + k tr(Vb (δ)H(δ)
b −1 ) (where k = 2
One natural way to interpret ordinal regression models is to analyze partial effects, where one
is interested in how a marginal change in one variable xijv changes the outcome distribution.
The partial probability effects in the cumulative model are given by:
∂P(Yij = rij |xij )
j
δr,v (xij ) = = − f1 (θj,rij − x⊤ ⊤
ij βj ) − f1 (θj,rij −1 − xij βj ) βjv ,
∂xijv
where f1 is the density corresponding to F1 , xijv is the v-th element in xij and βjv is the v-th
element in βj . In case of discrete variables it is more appropriate to consider the changes in
probability before and after the change in the variable instead of the partial effects using:
∆P(Yij = rij |xij , x̃ij ) = P(Yij = rij |x̃ij ) − P(Yij = rij |xij ),
where all elements of x̃ij are equal to xij except for the v-th element, which is equal to
x̃ijv = xijv + ∆xijv for the discrete change ∆xijv in the variable xv . We refer to Greene and
Hensher (2010) and Boes and Winkelmann (2006) for further discussion of the interpretation
of partial effects in ordered response models.
In the presence of the probit link function, we have the following relationship between the
cumulative probabilities and the latent process:
An increase of one unit in a variable xjv (given that all other variables are held constant)
changes the probit of the probability that category r or lower is observed by the value of
the coefficient βjv of this variable. In other words P(Yij ≤ rij |xij ), the probability that
category rij or lower is observed, changes by the increase/decrease in the distribution function.
Moreover, predicted probabilities for all ordered response categories can be calculated and
compared for given sets of explanatory variables.
In the presence of the logit link function, the regression coefficients of the underlying latent
process are scaled in terms of marginal log odds (McCullagh 1980):
!
P(Yij ≤ rij |xij )
log = θj,rij − x⊤
ij βj .
P(Yij > rij |xij )
For a one unit increase in one variable xjv holding all the others constant, we expect a change
of size of the coefficient βjv of this variable in the expected value on the log odds scale. Due
to the fact that the marginal effects of the odds ratios do not depend on the category, one
often exponentiates the coefficients in order to obtain the following convenient interpretation
in terms of odds ratios:
P(Yij ≤ rij |xij )/P(Yij > rij |xij )
= exp((x̃ij − xij )⊤ βj ).
P(Yij ≤ rij |x̃ij )/P(Yij > rij |x̃ij )
This means for a one unit increase in xjv , holding all the other variables constant, changes
the odds ratio by exp(βjv ). In other words, the odds after a one unit change in xjv are the
odds before the change multiplied by exp(−βjv ):
If the regression coefficients vary across the multiple responses, they cannot be compared
directly due to the fact that the measurement units of the underlying latent processes dif-
fer. Nevertheless, one possibility to compare coefficients is through concept of importance.
Reusens and Croux (2017) extend an approach for comparing coefficients of probit and logit
models by Hoetker (2007) in order to compare the coefficients across repeated measurements.
They analyze the importance ratio
βjv
Rjv = ,
βj,base
where βj,base is the coefficient of a base variable and v is one of the remaining p − 1 variables.
This ratio can be interpreted as follows: A one unit increase in the variable v has in expectation
the same effect in the base variable multiplied by the ratio Rjv . Another interpretation is
the so called compensation variation: The ratio is the required increase in the base variable
that is necessary to compensate a one unit decrease in the variable v in a way that the
score of the outcome remains the same. It is to be noted that the importance ratio Rjv
depends on the scale of the variables xjv and the xj,base . This implies that the comparison
among the measurements j should be done only if the scales of these variables are equal
across the multiple measurements. For this purpose, standardization of the covariates for
each measurement should be employed.
3. Implementation
The mvord package contains six datasets and the built in functions presented in Table 1.
Multivariate ordinal regression models in the R package mvord can be fitted using the main
function mvord(). Two different data structures can be passed on to the mvord() function
through the use of two different multiple measurement objects MMO and MMO2 in the left-
hand side of the model formula. MMO uses a long data format, which has the advantage that
it allows for varying covariates across multiple measurements. This flexibility requires to
specify a subject index as well as a multiple measurement index. In contrast to MMO, the
multiple measurement object MMO2 has a simplified data structure but is only applicable in
settings where the covariates do not vary between the multiple measurements. In this case,
the multiple ordinal observations as well as the covariates are stored in different columns of
a data.frame. We refer to this data structure as wide data format.
For illustration purposes we use a worked example based on a simulated data set consisting of
100 subjects for which two multiple ordinal responses (Y1 and Y2), two continuous covariates
(X1 and X2) and two factor covariates (f1 and f2) are available. The ordinal responses each
have three categories labeled with 1, 2 and 3.
R> data("data_mvord_toy")
R> str(data_mvord_toy)
The data set data_mvord_toy has a wide format. We convert the data set into the long
format, where the first column contains the subject index i and the second column the multiple
measurement index j.
R> str(data_toy_long)
Function Description
Fitting function
mvord(formula, data, ...) Estimates the multivariate ordinal regression model.
Prediction functions
predict(object, type, Obtains differrent types of predicted or fitted values
...) from the joint distribution of the responses for objects
of class ‘mvord’.
marginal_predict(object, Obtains differrent types of predictions or fitted values
type, ...) from the marginal distributions of the responses for
objects of class ‘mvord’.
joint_probabilities(object, For each subject, the joint probability of observing a
response.cat, ...) predefined configuration of responses response.cat
is computed for objects of class ‘mvord’.
Utility functions
coef(object, ...) Extracts the estimated regression coefficients.
thresholds(object, ...) Extracts the estimated threshold coefficients.
error_structure(object, Extracts for each subject the estimated parameters of
type, ...) the error structure.
constraints(object) Extracts the constraint matrices corresponding to
each regression coefficient.
names_constraints(formula, Extracts the names of the regression coefficients in
data, ...) the model matrix.
pseudo_R_squared(object, Computes Mc Fadden’s Pseudo R2 .
...)
Table 1: This table summarizes fitting, prediction, utility functions and other generic methods
implemented in mvord.
Rainer Hirk, Kurt Hornik and Laura Vana 13
$ Y : int 1 3 3 1 2 1 2 2 2 3 ...
$ X1: num -0.789 0.93 2.804 1.445 -0.191 ...
$ X2: num 1.3653 -0.00982 -0.25878 3.90187 0.04958 ...
$ f1: Factor w/ 3 levels "A","B","C": 3 2 2 3 3 3 2 2 3 1 ...
$ f2: Factor w/ 2 levels "c1","c2": 2 2 2 1 2 2 1 2 2 1 ...
Data structure
In MMO we use a long format for the input of data, where each row contains a subject index i,
a multiple measurement index j, an ordinal observation Y and all the covariates (X1 to Xp).
This long format data structure is internally transformed to an n × q matrix of responses
which contains NA in the case of missing entries and a list of covariate matrices Xj for all
j ∈ J. This is performed by the multiple measurement object MMO(Y, i, j) which specifies
the column names of the subject index and the multiple measurement index in data. The
column containing the ordinal observations can contain integer or character values or inherits
from class (ordered) ‘factor’. When using the long data structure, this column is basically a
concatenated vector of each of the multiple ordinal responses. Internally, this vector is then
split according to the measurement index. Then the ordinal variable corresponding to each
measurement index is transformed into an ordered ‘factor’. For an integer or a character
vector the natural ordering is used (ascending, or alphabetical). If for character vectors the
alphabetical order does not correspond to the ordering of the categories, the optional argument
response.levels allows to specify the levels for each response explicitly. This is performed
by a list of length q, where each element contains the names of the levels of the ordered
categories in ascending (or if desired descending) order. If all the multiple measurements use
the same number of classes and same labeling of the classes, the column Y can be stored as
an ordered ‘factor’ (as it is often the case in longitudinal studies).
The order of the multiple measurements is needed when specifying constraints on the threshold
or regression parameters (see Sections 3.5 and 3.6). This order is based on the type of the
multiple measurement index column in data. For ‘integer’, ‘character’ or ‘factor’ the
natural ordering is used (ascending, or alphabetical). If a different order of the multiple
responses is desired, the multiple measurement index column should be an ordered factor
with a corresponding ordering of the levels.
Formula
The multiple measurement object MMO including the ordinal responses Y, the subject index i
and the multiple measurement index j is passed on the left-hand side of a formula object.
2
Computations have been performed with R version 3.4.4 on a machine with an Intel Core i5-4200U CPU
1.60GHz processor and 8GB RAM.
14 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
The covariates X1, ..., Xp are passed on the right-hand side. In order to ensure identifia-
bility intercepts can be included or excluded in the model depending on the chosen model
parameterization.
Model without intercept If the intercept should be removed, the formula can be specified
in the following ways:
or
Model with intercept If one wants to include an intercept in the model, there are two
equivalent possibilities to set the model formula. Either the intercept is included explicitly
by:
or by
Note on intercept in formula We differ in our approach of specifying the model formula
from the formula objects in e.g., MASS::polr() or ordinal::clm(), in that we allow the user
to specify models without intercept. This option is not supported in the MASS and ordinal
packages, where an intercept is always specified in formula as the threshold parameters are
treated as intercepts. We choose to allow for this option, in order to have a correspondence
to the identifiability constraints presented in Section 2.3.
Even so, the user should be aware that the threshold parameters are basically category and
outcome-specific intercepts. This implies that, even if the intercept is explicitly removed from
the model through the formula object and hence set to zero, the rest of the covariates should
be specified in such a way that multicollinearity does not arise. This is of primary importance
when including categorical covariates, where one category will be taken as baseline by default.
The multiple measurement object MMO2 is only applicable for settings where the covariates do
not vary between the multiple measurements.
Data structure
The data structure applied by MMO2 is slightly simplified, where the multiple ordinal obser-
vations as well as the covariates are stored as columns in a data.frame. Each subject i
corresponds to one row of the data frame, where all outcomes Yi1 , . . . , Yiq (with missing ob-
servations set to NA) and all the covariates xi1 , . . . , xip are stored in different columns. Ideally
each outcome column is of type ordered ‘factor’. If columns of the responses have types like
‘integer’, ‘character’ or ‘factor’ a warning is displayed and the natural ordering is used
(ascending, or alphabetical).
Formula
In order to specify the model we use a multivariate formula object of the form:
The ordering of the responses is given by the ordering in the left-hand side of the model
formula. MMO2 performs like cbind() in fitting multivariate models in e.g., lm() or glm().
link = mvprobit()
For the multivariate logit link a t copula based multivariate distribution with logistic margins
is used (as explained in Section 2.2) and can be specified by:
The mvlogit() function has an optional integer valued argument df which specifies the
degrees of freedom to be used for the t copula. The default value of the degrees of freedom
parameter is 8. When choosing ν ≈ 8, the multivariate logistic distribution in Equation 2
is well approximated by a multivariate t distribution (O’Brien and Dunson 2004). This is
also the value chosen by Nooraee et al. (2016) in their analysis. We restrict the degrees
of freedom to be integer valued because the most efficient routines for computing bivariate
16 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
t probabilities do not support non-integer degrees of freedom. We use the Fortran code from
Alan Genz (Genz and Bretz 2009) to compute the bivariate t probabilities. As the degrees of
freedom parameter is integer valued, we do not estimate it in the optimization procedure. If
the optimal degrees of freedom are of interest, we leave the task of choosing an appropriate
grid of values of df to the user, who could then estimate a separate model for each value in
the grid. The best model can be chosen by CLAIC or CLBIC.
Basic model
In the basic model we support three different correlation structures and one covariance struc-
ture:
Correlation For the basic model specification the following correlation structures are im-
plemented in mvord:
Variance A model with variance parameters VAR(ǫij ) = σj2 corresponding to each outcome,
when the identifiability requirements are fulfilled, can be specified in the following way:
Correlation
Rainer Hirk, Kurt Hornik and Laura Vana 17
Variance
threshold.constraints = c(1, 1)
where the first value corresponds to the first response Y1 and the second to the second response
Y2. This order of the responses is defined as explained in Sections 3.1 and 3.2
of length Kj − 1 (where Kj is the number of ordered categories for ordinal outcome j). A
numeric value in this vector fixes the corresponding threshold parameter to a specified value
while NA leaves the parameter flexible and indicates it should be estimated.
After specifying the error structure (through the error.structure argument) and the in-
clusion/exclusion of an intercept in the formula argument, the user can choose among five
possible options for fixing the thresholds:
• fixing the first and second thresholds θj,1 = aj , θj,2 = bj for all outcomes with Kj > 2;
• fixing the first and last thresholds θj,1 = aj , θj,Kj −1 = bj for all outcomes with Kj > 2;
Note that the option chosen needs to be consistent across the different outcomes (e.g., it is
not allowed to fix the first and the last threshold for one outcome and the first and the second
threshold for a different outcome). Table 2 provides information about the options available
for each combination error structure and intercept, as well as about the default values in
case the user does not specify any threshold values. In the presence of binary observations
(Kj = 2), if a cov_general error structure is used, the intercept has always to be fixed to
some value due to identifiability constraints. In a correlation structure setting no further
restrictions are required.
For example, if the following restrictions should apply to the worked example:
• θ11 = −1 ≤ θ12 ,
• θ21 = −1 ≤ θ22 ,
Threshold parameters
Error all flexible one fixed two fixed two fixed all fixed
Intercept
structure θj,1 = aj θj,1 = aj θj,1 = aj
θj,2 = bj θj,Kj −1 = bj
no X X X X X
cor
yes X X X X
no X X X X
cov
yes X X X
Table 2: This table displays different model parameterizations in the presence of ordinal ob-
servations (Kj > 2 ∀j ∈ J). The row cor includes error structures cor_general, cor_equi
and cor_ar1, while row cov includes the error structure cov_general. The minimal restric-
tions (default) to ensure identifiability are given in green. The default threshold values (in
case threshold.values = NULL) are always aj = 0 and bj = 1.
where the regression coefficients for variables X1 and X2 are set to be equal across the two
outcomes (β1 = β2 ) by:
coef.constraints = c(1, 1)
A more flexible framework allows the user to specify constraints for each of the regression
coefficients of the p covariates3 and not only for the whole vector. Such constraints will be
specified by means of a matrix of dimension q × p, where each column specifies constraints
for one of the p covariates in the same way as presented above. Moreover, a value of NA
indicates that the corresponding coefficient is fixed (as we will show below) and should not
be estimated.
Consider the following specification of the latent processes in the worked example:
Yei1 = β11 xi1 + β3 ✶{fi2 =c2} + ǫi1 , Yei2 = β21 xi1 + β22 xi2 + β3 ✶{fi2 =c2} + ǫi2 , (4)
3
Note that if categorical covariates or interaction terms are included in the formula, p denotes the number
of columns of the design matrix.
20 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
where ✶{fi2 =c2} is the indicator function which equals one in case the categorical covariate
f2 is equal to class c2. Class c1 is taken as the baseline category. These restrictions on the
regression coefficients are imposed by:
Specific values of coefficients can be fixed through the coef.values argument, as we will
show in the following.
• β11,1 6= β11,2 ;
• β22,1 6= β22,2 ,
where βjk,r denotes the regression coefficient of covariate k in the linear predictor of the r-th
cumulative probit or logit for measurement j. By the first restriction, for the first outcome
two regression coefficients are employed for covariate X1: β11,1 for the first linear predictor and
β11,2 for the second linear predictor. Covariate X2 only appears in the model for the second
outcome. For each of the two linear predictors a different regression coefficient is estimated:
β22,1 and β22,2 .
The constraints are set up as a named list where the names correspond to the names of all
covariates in the model.matrix. To check the name of the covariates in the model matrix
one can use the auxiliary function names_constraints() available in mvord (see also next
subsection):
P
The number of rows is equal to the total number of linear predictors j (Kj − 1) of the
ordered responses, in the example above 2 + 2 = 4 rows. The number of columns represents
the number of parameters to be estimated for each covariate:
coef.constraints = list(
X1 = cbind(c(1, 0, 0, 0), c(0, 1, 0, 0), c(0, 0, 1, 1)),
X2 = cbind(c(0, 0, 1, 0), c(0, 0, 0, 1)), f2c2 = cbind(rep(1, 4)))
For more details we refer the reader to the documentation of the VGAM package.
This should be used when setting up the coefficient constraints. Please note that by default
category A for factor f1 and category c1 for factor f2 are taken as baseline categories. This
can be changed by using the optional argument contrasts. In models without intercept,
the estimated threshold parameters relate to the baseline category and the coefficients of the
remaining factor levels can be interpreted as a shift of the thresholds.
weights.name
Weights on each subject i are chosen in a way that they are constant across multiple measure-
ments. Weights should be stored in a column of data. The column name of the weights in data
should be passed as a character string to this argument by weights.name = "weights". If
weights.name = NULL all weights are set to one by default. Negative weights are not allowed.
offset
If offsets are not specified in the model formula, a list with a vector of offsets for each multiple
measurement can be passed.
22 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
contrasts
contrasts can be specified by a named list as in the argument contrasts.arg of
model.matrix.default().
PL.lag
In longitudinal studies, where qi is possibly large, the pairwise likelihood estimation can be
time consuming as it is built from all two dimensional combinations of j ∈ Ji . To overcome
this difficulty, one can construct the likelihood using only the bivariate probabilities for pairs
of observations less than lag in “time units” apart. A similar approach was proposed by Varin
and Czado (2009). Assuming that, for each subject i, we have a time-series of consecutive
ordinal observations, the i-th component of the pairwise likelihood has the following form:
i −1
qX qi
X
pℓlag
i (δ) = wi
✶{|k−l|≤lag} log P(Yik = rik , Yil = ril ) .
k=1 l=k+1
The lag can be fixed by a positive integer argument PL.lag and it can only be used along with
error.structure = cor_ar1(). The use of this argument is, however, not recommended if
there are missing observations in the time series, i.e., if the ordinal variables are not observed
in consecutive years. Moreover, one should also proceed with care if the observations are not
missing at random.
solver
This argument can either be a character string or a function. All general purpose opti-
mizers of the R package optimx (Nash and Varadhan 2011; Nash 2014) can be used for
maximization of the composite log-likelihood by passing the name of the solver as a char-
acter string to the solver argument. The available solvers in optimx are, at the time of
writing, "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlm", "nlminb", "spg", "ucminf",
"newuoa", "bobyqa", "nmkb", "hjkb", "Rcgmin" and "Rvmmin". The default in mvord is
solver "newuoa". The "BFGS" solver performs well in terms of computation time, but it
suffers from convergence problems, especially for the mvlogit() link.
Alternatively, the user has the possibility of applying other solvers by using a wrapper function
with arguments starting.values and objFun of the following form:
solver = function(starting.values, objFun) {
optRes <- solver.function(...)
list(optpar = optRes$optpar, objvalue = optRes$objvalue,
convcode = optRes$convcode, message = optRes$message)
}
The solver.function() should return a list of three elements optpar, objvalue and
convcode. The element optpar should be a vector of length equal to number of parame-
ters to optimize containing the estimated parameters, while the element objvalue should
Rainer Hirk, Kurt Hornik and Laura Vana 23
contain the value of the objective function after the optimization procedure. The convergence
status of the optimization procedure should be returned in element convcode with 0 indicat-
ing successful convergence. Moreover, an optional solver message can be returned in element
message.
solver.optimx.control
A list of control arguments that are to be passed to the function optimx(). For further details
see Nash and Varadhan (2011).
se
If se = TRUE standard errors are computed analytically using the Godambe information ma-
trix (see Section 2.5).
start.values
A list of starting values for threshold as well as regression coefficients can be passed by the
argument start.values. This list contains a list (with a vector of starting values for each
dimension) theta of all flexible threshold parameters and a list beta of all flexible regression
parameters.
Several methods are implemented for the class ‘mvord’. These methods include a summary()
and a print() function to display the estimation results, a coef() function to extract the re-
gression coefficients, a thresholds() function to extract the threshold coefficients and a func-
tion error_structure() to extract the estimated parameters of the correlation/covariance
structure of the errors. The pairwise log-likelihood can be extracted by the function logLik(),
function vcov() extracts the variance-covariance matrix of the parameters and nobs() pro-
vides the number of subjects. Other standard methods such as terms() and model.matrix()
are also available. Functions AIC() and BIC() can be used to extract the composite likelihood
information criteria CLAIC and CLBIC.
In addition, joint probabilities can be extracted by the predict() or fitted() function:
R> predict(res, subjectID = 1:6)
1 2 3 4 5 6
0.9982776 0.2830394 0.9985192 1.0000000 0.8782797 0.9963333
24 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
1 2 3 4 5 6
0.9982776 1.0000000 1.0000000 1.0000000 0.9745760 0.9963333
and classes:
Y1 Y2
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 1 1
The function marginal_predict() provides marginal predictions for the types probability,
cumulative probability and class, while joint_probabilities() extracts fitted joint proba-
bilities (or cumulative probabilities) for given response categories from a fitted model.
4. Examples
In credit risk applications, multiple credit ratings from different credit rating agencies are
available for a panel of firms. Such a data set has been analyzed in ?, where a multivariate
model of corporate credit ratings has been proposed. Unfortunately, this original data set is
not freely re-distributable. Therefore, we resort to the simulation of illustrative data sets by
taking into consideration key features of the original data.
We simulate relevant covariates corresponding to firm-level and market financial ratios in the
original data set. The following covariates are chosen in line with literature on determinants
of credit ratings (e.g., Campbell, Hilscher, and Szilagyi 2008; Puccia, Collett, Kernan, Palmer,
Mettrick, and Deslondes 2013): LR (liquidity ratio relating the current assets to current liabil-
ities), LEV (leverage ratio relating debt to earnings before interest and taxes), PR (profitability
ratio of retained earnings to assets), RSIZE (logarithm of the relative size of the company in
the market), BETA (a measure of systematic risk). We fit a distribution to each covariate
using the function fitdistr() of the MASS package. The best fitting distribution among
all available distributions in fitdistr() has been chosen by AIC.
We generate two data sets for illustration purposes. The first data set data_cr consists of
multiple ordinal rating observations at the same point in time for a collection of 690 firms.
We generate ratings from four rating sources rater1, rater2, rater3 and rater4. Raters
rater1 and rater2 assign ordinal ratings on a five-point scale (from best to worst A, B, C,
D and E), rater3 on a six-point scale (from best to worst, F, G, H, I, J and K) and rater4
distinguishes between investment and speculative grade firms (from best to worst, L and M).
The panel of ratings in the original data set is unbalanced, as not all firms receive ratings
Rainer Hirk, Kurt Hornik and Laura Vana 25
from all four sources. We therefore keep the missingness pattern and remove the simulated
ratings that correspond to missing observations in the original data. For rater1 we remove
5%, for rater2 30%, and for rater3 50% of the observations. This data set has a wide data
format.
The second data set data_cr_panel contains ordinal rating observations assigned by one
rater to a panel of 1415 firms over a period of eight years on an yearly basis. In addition to
the covariates described above, a business sector variable (BSEC) with eight levels is included
for each firm. This data set has a long format, with 11320 firm-year observations.
R> data("data_cr")
R> head(data_cr, n = 3)
We include five financial ratios as covariates in the model without intercept through the
following formula:
0 0 0 0
A B C D E A B C D E F G H I J K L M
Figure 1: This figure displays the rating distribution of all the raters.
We are dealing with a wide data format, as the covariates do not vary among raters. Hence,
the estimation can be performed by applying multiple measurement object MMO2 in the fitting
function mvord(). A model with multivariate probit link (default) is fitted by:
(runtime 2 minutes).
The results are displayed by the function summary():
Thresholds:
Estimate Std. Error z value Pr(>|z|)
rater1 A|B 8.05308 0.44312 18.174 < 2.2e-16 ***
rater1 B|C 9.57196 0.47384 20.201 < 2.2e-16 ***
rater1 C|D 11.35469 0.51753 21.940 < 2.2e-16 ***
rater1 D|E 13.52181 0.60134 22.486 < 2.2e-16 ***
rater2 A|B 8.59974 0.49820 17.262 < 2.2e-16 ***
rater2 B|C 10.06007 0.53930 18.654 < 2.2e-16 ***
rater2 C|D 11.86508 0.59726 19.866 < 2.2e-16 ***
rater2 D|E 14.34057 0.70069 20.466 < 2.2e-16 ***
rater3 F|G 8.24546 0.51708 15.946 < 2.2e-16 ***
rater3 G|H 9.77754 0.55527 17.608 < 2.2e-16 ***
Rainer Hirk, Kurt Hornik and Laura Vana 27
Coefficients:
Estimate Std. Error z value Pr(>|z|)
LR 1 0.208387 0.067996 3.0647 0.002179 **
LR 2 0.153527 0.073349 2.0931 0.036340 *
LR 3 0.180650 0.078391 2.3045 0.021195 *
LR 4 0.150135 0.112011 1.3404 0.180128
LEV 1 0.430524 0.043758 9.8388 < 2.2e-16 ***
LEV 2 0.433143 0.050132 8.6400 < 2.2e-16 ***
LEV 3 0.399637 0.050768 7.8719 3.493e-15 ***
LEV 4 0.626346 0.074278 8.4325 < 2.2e-16 ***
PR 1 -2.574577 0.194047 -13.2678 < 2.2e-16 ***
PR 2 -2.829004 0.216932 -13.0410 < 2.2e-16 ***
PR 3 -2.679726 0.222574 -12.0397 < 2.2e-16 ***
PR 4 -2.797267 0.281530 -9.9360 < 2.2e-16 ***
RSIZE 1 -1.130529 0.056380 -20.0518 < 2.2e-16 ***
RSIZE 2 -1.197017 0.061751 -19.3845 < 2.2e-16 ***
RSIZE 3 -1.196935 0.066398 -18.0266 < 2.2e-16 ***
RSIZE 4 -1.567831 0.116397 -13.4696 < 2.2e-16 ***
BETA 1 1.602576 0.110842 14.4581 < 2.2e-16 ***
BETA 2 1.802612 0.140077 12.8687 < 2.2e-16 ***
BETA 3 1.517178 0.139209 10.8985 < 2.2e-16 ***
BETA 4 1.990449 0.204850 9.7166 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error Structure:
Estimate Std. Error z value Pr(>|z|)
corr rater1 rater2 0.874183 0.024864 35.158 < 2.2e-16 ***
corr rater1 rater3 0.914814 0.023171 39.481 < 2.2e-16 ***
corr rater1 rater4 0.900697 0.031939 28.201 < 2.2e-16 ***
corr rater2 rater3 0.837847 0.041416 20.230 < 2.2e-16 ***
corr rater2 rater4 0.926213 0.031728 29.192 < 2.2e-16 ***
corr rater3 rater4 0.845626 0.060134 14.062 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The threshold parameters are labeled with the name of the corresponding outcome and the
two adjacent categories which are separated by a vertical bar |. For each covariate the
estimated coefficients are labeled with the covariate name and a number. This number is
from the sequence along the number of columns in the list element of constraints() which
28 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
corresponds to the covariate. Note that if no constraints are set on the regression coefficients,
this number of the coefficient corresponds to the outcome dimension. If constraints are set
on the parameter space, we refer the reader to Section 4.2. The last part of the summary
contains the estimated error structure parameters. For error structures cor_general and
cov_general the correlations (and variances) are displayed. The coefficients corresponding
to the error structure are displayed for cor_ar1 and cor_equi. Correlations and Fisher-z
scores for each subject are obtained by function error_structure().
Another option to display the results is the function print(). The threshold coefficients can
be extracted by the function thresholds():
R> thresholds(res_cor_probit_simple)
$rater1
A|B B|C C|D D|E
8.053083 9.571962 11.354686 13.521806
$rater2
A|B B|C C|D D|E
8.599739 10.060068 11.865083 14.340568
$rater3
F|G G|H H|I I|J J|K
8.245461 9.777541 11.709568 13.097152 14.177082
$rater4
L|M
13.54304
R> coef(res_cor_probit_simple)
LR 1 LR 2 LR 3 LR 4 LEV 1 LEV 2
0.2083869 0.1535266 0.1806502 0.1501350 0.4305235 0.4331427
LEV 3 LEV 4 PR 1 PR 2 PR 3 PR 4
0.3996369 0.6263461 -2.5745773 -2.8290041 -2.6797255 -2.7972672
RSIZE 1 RSIZE 2 RSIZE 3 RSIZE 4 BETA 1 BETA 2
-1.1305294 -1.1970173 -1.1969355 -1.5678310 1.6025757 1.8026120
BETA 3 BETA 4
1.5171782 1.9904487
The error structure for firm with firm_id = 11 is displayed by the function
error_structure():
R> error_structure(res_cor_probit_simple)[[11]]
Rainer Hirk, Kurt Hornik and Laura Vana 29
• We assume that rater1 and rater2 use the same rating methodology. This means that
they use the same rating classes with the same labeling and the same thresholds on the
latent scale. Hence, we set the following constraints on the threshold parameters:
threshold.constraints = c(1, 1, 2, 3)
• We assume that some covariates are equal for some of the raters. We assume that the
coefficients of LR and PR are equal for all four raters, that the coefficients of RSIZE are
the equal for the raters rater1, rater2 and rater3 and the coefficients of BETA are the
same for the raters rater1 and rater2. The coefficients of LEV are assumed to vary for
all four raters. These restrictions are imposed by:
(runtime 7 minutes).
The results are displayed by the function summary():
Thresholds:
30 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
Coefficients:
Estimate Std. Error z value Pr(>|z|)
LR 1 0.340210 0.110547 3.0775 0.002087 **
LEV 1 0.784295 0.075977 10.3228 < 2.2e-16 ***
LEV 2 0.779695 0.078364 9.9497 < 2.2e-16 ***
LEV 3 0.718330 0.093425 7.6889 1.484e-14 ***
LEV 4 1.107836 0.123681 8.9572 < 2.2e-16 ***
PR 1 -4.917965 0.343464 -14.3187 < 2.2e-16 ***
RSIZE 1 -2.093379 0.103690 -20.1889 < 2.2e-16 ***
RSIZE 2 -2.746162 0.188731 -14.5507 < 2.2e-16 ***
BETA 1 3.135693 0.221944 14.1283 < 2.2e-16 ***
BETA 2 2.733086 0.252960 10.8044 < 2.2e-16 ***
BETA 3 3.572688 0.349493 10.2225 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error Structure:
Estimate Std. Error z value Pr(>|z|)
corr rater1 rater2 0.859773 0.027907 30.808 < 2.2e-16 ***
corr rater1 rater3 0.908834 0.024636 36.891 < 2.2e-16 ***
corr rater1 rater4 0.903959 0.031857 28.375 < 2.2e-16 ***
corr rater2 rater3 0.834910 0.044258 18.865 < 2.2e-16 ***
corr rater2 rater4 0.932243 0.032172 28.977 < 2.2e-16 ***
corr rater3 rater4 0.856221 0.058398 14.662 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If constraints on the threshold or regression coefficients are imposed, duplicated estimates are
not displayed. If thresholds are set equal for two outcome dimensions only the thresholds for
the former dimension are shown. In the example above only the thresholds for rater1 are
displayed. For each covariate the estimated coefficients are labeled with the covariate name
and a number. This number is from a sequence along the number of columns in the list element
of the corresponding covariate in constraints() (see Section 3.6.3). The auxiliary function
Rainer Hirk, Kurt Hornik and Laura Vana 31
constraints() can be used to extract the constraints on the coefficients. The column names
of the constraint matrices for each outcome correspond to the coefficient names displayed in
the summary. For each covariate the coefficients to be estimated are numbered consecutively.
In the above example this means that for covariates LR and PR only one covariate is estimated,
a coefficient for each outcome is estimated for LEV, while for covariate RSIZE two and for
covariate BETA three coefficients are estimated. For example, the coefficient BETA 1 is used
by rater1 and rater2, the coefficient BETA 2 is used by rater3 while BETA 3 is the coefficient
for rater4. The constraints for covariate BETA can be extracted by:
R> constraints(res_cor_logit)$BETA
Table 3: This table displays measures of fit for the multivariate probit model in Example 1
(presented in Section 4.1) and the multivariate logit model in Example 2 (presented in Sec-
tion 4.2).
rater 1 rater 2
100 100
E E
80 80
D D
60 60
predicted
predicted
C C
40 40
B B
20 20
A A
0 0
A B C D E A B C D E
observed observed
rater 3 rater 4
100 100
K
80 80
J
M
60 60
predicted
predicted
H 40 40
L
G
20 20
F
0 0
F G H I J K L M
observed observed
Figure 2: This figure displays agreement plots of the predicted categories of the model
res_cor_logit against the observed rating categories for all raters. For each observed rating
class the distribution of the predicted ratings is displayed.
with a covariate dependent AR(1) error structure using the data set data_cr_panel:
R> data("data_cr_panel")
R> str(data_cr_panel, vec.len = 3)
R> head(data_cr_panel, n = 3)
The simulated data set has a long data format and contains the credit risk measure rating and
six covariates for a panel of 1415 firms over eight years. The number of firm-year observations
is 11320.
We include five financial ratios as covariates in the model with intercept by a formula with
multiple measurements object MMO:
• The threshold parameters are constant over the years. This can be specified through
the argument threshold.constraints:
• We assume that there is a break-point in the regression coefficients after year4 in the
sample. This break-point could correspond to the beginning of a crisis in a real case
application. Hence, we use one set of regression coefficients for years year1, year2,
year3 and year4 and a different set for year5, year6, year7 and year8. This can be
specified through the argument coef.constraints:
• Given the longitudinal aspect of the data, an AR(1) correlation structure is an appro-
priate choice. Moreover, we use the business sector as a covariate in the correlation
structure. The dependence of the correlation structure on the business sector is moti-
vated by the fact that in some sectors such as manufacturing ratings tend to be more
“sticky”, i.e., do not change often over the years, while in more volatile sectors like IT
there is less “stickiness” in the ratings.
(runtime 8 minutes).
The results of the model can be presented by the function summary():
Thresholds:
Estimate Std. Error z value Pr(>|z|)
year1 A|B 0.000000 0.000000 NA NA
year1 B|C 0.984647 0.025802 38.162 < 2.2e-16 ***
year1 C|D 2.364711 0.039873 59.306 < 2.2e-16 ***
year1 D|E 3.728002 0.055724 66.901 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1 1.42471225 0.04556961 31.2645 < 2.2e-16 ***
(Intercept) 2 1.49164394 0.05786976 25.7759 < 2.2e-16 ***
LR 1 0.02142909 0.00054203 39.5346 < 2.2e-16 ***
LR 2 0.02959425 0.00096574 30.6442 < 2.2e-16 ***
LEV 1 0.01114252 0.00052558 21.2004 < 2.2e-16 ***
LEV 2 0.01390128 0.00081658 17.0238 < 2.2e-16 ***
PR 1 -0.87154954 0.03320032 -26.2512 < 2.2e-16 ***
PR 2 -0.67501624 0.04542960 -14.8585 < 2.2e-16 ***
RSIZE 1 -0.34752657 0.00995679 -34.9035 < 2.2e-16 ***
Rainer Hirk, Kurt Hornik and Laura Vana 35
Error Structure:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.408874 0.054179 26.0042 < 2.2e-16 ***
BSECBSEC2 -0.487134 0.071649 -6.7989 1.054e-11 ***
BSECBSEC3 -0.055125 0.064215 -0.8585 0.39064
BSECBSEC4 -0.108108 0.062361 -1.7336 0.08299 .
BSECBSEC5 -0.069888 0.079575 -0.8783 0.37980
BSECBSEC6 -0.599137 0.069668 -8.5999 < 2.2e-16 ***
BSECBSEC7 -0.764239 0.067277 -11.3597 < 2.2e-16 ***
BSECBSEC8 -0.653992 0.078939 -8.2848 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the fixed threshold coefficient year1 A|B, the z values and the corresponding p values are
set to NA.
The default error_structure() method for a ‘cor_ar1’ gives:
R> error_structure(res_AR1_probit)
In addition, the correlation parameters ρi for each firm are obtained by choosing type =
"corr" in error_structure():
Correlation
1 0.8749351
2 0.6694448
3 0.8749351
Moreover, the correlation matrices for each specific firm are obtained by choosing type =
"sigmas" in error_structure():
$`1`
year1 year2 year3 year4 year5 year6
36 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
R> str(data_SRHS_long)
The dataset contains a self-reported health status srhs together with covariates such as
ethnicity (race coded as 1 = white, 2 = black, 3 = others), gender (coded as 1 = male, 2
= female), education level (coded as 1 = high school, 2 = general educational diploma, 3 =
high school graduate, 4 = some college, 5 = college and above) and age for 7074 subjects on
8 different (approximately equally spaced) time occasions.
We add a time index to the data set:
We estimate a multivariate ordinal logit model similar to the one of the models employed by
Bartolucci, Pandolfi, and Pennoni (2017), where for every subject in the sample the errors
follow an AR(1) process. Moreover, the threshold and regression coefficients are equal across
all years. In order to reduce the computational burden we only consider pairs of observations
not more than two time points appart by setting PL.lag = 2.
(runtime 11 minutes).
The persistence in the reported health status is high. The correlation parameter in the
cor_ar1 error structure is
Correlation
1 0.7621568
Thresholds:
Estimate Std. Error z value Pr(>|z|)
1 1|2 -1.78171 0.12730 -13.9958 <2e-16 ***
1 2|3 -0.10283 0.12866 -0.7993 0.4241
1 3|4 1.42587 0.13019 10.9525 <2e-16 ***
1 4|5 3.05552 0.13322 22.9364 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficients:
Estimate Std. Error z value Pr(>|z|)
factor(gender)2 1 -0.0042032 0.0346822 -0.1212 0.9035
factor(race)2 1 0.6131727 0.0504085 12.1641 < 2.2e-16 ***
factor(race)3 1 0.5060946 0.0897595 5.6383 1.717e-08 ***
factor(education)2 1 -0.5348146 0.0803199 -6.6586 2.765e-11 ***
factor(education)3 1 -1.0149545 0.0465579 -21.7998 < 2.2e-16 ***
factor(education)4 1 -1.3080628 0.0526356 -24.8513 < 2.2e-16 ***
38 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
Error Structure:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0013411 0.0084935 117.9 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the logit model the coefficients can be interpreted in terms of log-odds ratios. The re-
sults suggest that being non-white increases the chances of reporting a worse health status
while higher people with education levels tend to report better health. Moreover, every addi-
tional year increases the odds of reporting a worse health status by 1.64% (exp(0.0162628) =
1.016396).
The traditional measure of agreement between raters in the social sciences is the polychoric
correlation. The polychoric correlation can be assessed using the mvord package by estimating
of a model with no covariates and probit link. The correlation parameters of the cor_general
error structure are to be interpretated as the measure of agreement.
R> head(df)
The data is in the wide format (each ordinal response is in one column) so the MMO2 object
will be used in the formula object:
(runtime 19 seconds).
Thresholds:
Estimate Std. Error z value Pr(>|z|)
Judge1 1|2 -1.24483406 0.07950626 -15.6571 < 2.2e-16 ***
Judge1 2|3 -0.66341394 0.06768901 -9.8009 < 2.2e-16 ***
Judge1 3|4 -0.32971161 0.06516921 -5.0593 4.208e-07 ***
Judge1 4|5 0.00013865 0.06328033 0.0022 0.9983
Judge1 5|6 0.34966460 0.06546890 5.3409 9.247e-08 ***
Judge1 6|7 0.63528723 0.06688036 9.4989 < 2.2e-16 ***
Judge1 7|8 0.99375222 0.07409651 13.4116 < 2.2e-16 ***
Judge1 8|9 1.33448377 0.08063648 16.5494 < 2.2e-16 ***
Judge1 9|10 1.91969997 0.10649013 18.0270 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Error Structure:
Estimate Std. Error z value Pr(>|z|)
corr Judge1 Judge2 0.476079 0.061835 7.6992 1.370e-14 ***
corr Judge1 Judge3 0.569194 0.046682 12.1930 < 2.2e-16 ***
corr Judge1 Judge4 0.645197 0.046595 13.8468 < 2.2e-16 ***
corr Judge1 Judge5 0.322529 0.090318 3.5710 0.0003556 ***
corr Judge2 Judge3 0.396377 0.056243 7.0475 1.821e-12 ***
corr Judge2 Judge4 0.634421 0.046930 13.5186 < 2.2e-16 ***
corr Judge2 Judge5 0.621900 0.041815 14.8725 < 2.2e-16 ***
corr Judge3 Judge4 0.391911 0.072272 5.4227 5.871e-08 ***
corr Judge3 Judge5 0.382999 0.066020 5.8012 6.582e-09 ***
corr Judge4 Judge5 0.693040 0.041728 16.6087 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
40 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
(runtime 1 minutes).
Thresholds:
Estimate Std. Error z value Pr(>|z|)
Judge1 1|2 1.09799 0.78039 1.4070 0.1594326
Judge1 2|3 1.68728 0.78221 2.1571 0.0309994 *
Judge1 3|4 2.02523 0.78287 2.5869 0.0096838 **
Judge1 4|5 2.36059 0.78294 3.0150 0.0025694 **
Judge1 5|6 2.71667 0.78472 3.4620 0.0005362 ***
Judge1 6|7 3.00649 0.78550 3.8275 0.0001295 ***
Judge1 7|8 3.36963 0.78891 4.2712 1.944e-05 ***
Judge1 8|9 3.71636 0.79304 4.6862 2.783e-06 ***
Judge1 9|10 4.31272 0.79820 5.4031 6.551e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficients:
Estimate Std. Error z value Pr(>|z|)
wl 1 0.54193 0.17837 3.0382 0.00238 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error Structure:
Estimate Std. Error z value Pr(>|z|)
corr Judge1 Judge2 0.459154 0.062913 7.2983 2.915e-13 ***
corr Judge1 Judge3 0.558919 0.047414 11.7880 < 2.2e-16 ***
corr Judge1 Judge4 0.631977 0.048175 13.1182 < 2.2e-16 ***
corr Judge1 Judge5 0.295952 0.089837 3.2943 0.0009866 ***
corr Judge2 Judge3 0.382554 0.056902 6.7230 1.780e-11 ***
corr Judge2 Judge4 0.621086 0.048802 12.7266 < 2.2e-16 ***
corr Judge2 Judge5 0.608996 0.043499 14.0004 < 2.2e-16 ***
corr Judge3 Judge4 0.374733 0.073984 5.0651 4.082e-07 ***
corr Judge3 Judge5 0.368174 0.067220 5.4771 4.323e-08 ***
Rainer Hirk, Kurt Hornik and Laura Vana 41
The probabilities of agreement among all five judges which implied by the model can be
computed by the function joint_probabilities():
In order to assess the relationship between the average word length and the agreement prob-
abilities, we plot in Figure 3 the probabilities of agreement implied by the model against the
word length.
The graphic suggests that the judges tend to agree more on the quality of essays with lower
average word length than on the essays with larger average word length.
5. Conclusion
The present paper is meant to provide a general overview on the R package mvord, which
implements the estimation of multivariate ordinal probit and logit regression models using
the pairwise likelihood approach. We offer the following features which (to the best of our
knowledge) enhance the currently available software for multivariate ordinal regression models
in R:
• We account for heterogeneity in the error structure among the subjects by allowing the
use of subject-specific covariates in the specification of the error structure.
• The user can impose further restrictions on the threshold and regression parameters in
order to achieve a more parsimonious model (e.g., using one set of thresholds for all
outcomes).
42 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
Figure 3: Probabilities of agreement between the five raters and word length
●
●
●
●
●
0.020
●
●
●
●
●
probability of agreement
●
●
●
●●
●
0.015
●
●●
●●
●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●●
●
0.010
●
●●
●●
●●
●
●●
●
●●
●●
●●
●
●●
●
●●●
●●
●
●
●●●●
●●
●●
●●
●●
●●
●●●
●
●●
●
●●
●
●
●●
●●
●
●●● ●●
0.005
●
●
●●●
●●
●●●●●
●
●●● ●●
word length
• We offer the possibility to choose different parameterizations, which are needed in or-
dinal models to ensure identifiability.
References
Afonso A, Gomes P, Rother P (2009). “Ordered Response Models for Sovereign Debt Ratings.”
Applied Economics Letters, 16(8), 769–773. doi:10.1080/13504850701221931.
Agresti A (2002). Categorical Data Analysis. 2nd edition. John Wiley & Sons.
Agresti A (2010). Analysis of Ordinal Categorical Data. 2nd edition. John Wiley & Sons.
Alp A (2013). “Structural Shifts in Credit Rating Standards.” The Journal of Finance, 68(6),
2435–2470. doi:10.1111/jofi.12070.
Archer KJ, Hou J, Zhou Q, Ferber K, Layne JG, Gentry AE (2014). “ordinalgmifs: An R
Package for Ordinal Regression in High-dimensional Data Settings.” Cancer Informatics,
13, 187–195. doi:10.4137/CIN.S20806.
Bhat CR, Varin C, Ferdous N (2010). “A Comparison of the Maximum Simulated Likelihood
and Composite Marginal Likelihood Estimation Approaches in the Context of the Multi-
variate Ordered-Response Model.” In Maximum Simulated Likelihood Methods and Appli-
cations, pp. 65–106. Emerald Group Publishing Limited. doi:10.1108/S0731-9053(2010)
0000026007.
Blume ME, Lim F, Mackinlay AC (1998). “The Declining Credit Quality of U.S. Corporate
Debt: Myth or Reality?” The Journal of Finance, 53(4), 1389–1413. doi:10.1111/
0022-1082.00057.
Bürkner PC (2017). “brms: An R Package for Bayesian Multilevel Models Using Stan.”
Journal of Statistical Software, 80(1), 1–28. doi:10.18637/jss.v080.i01.
Campbell JY, Hilscher J, Szilagyi J (2008). “In Search of Distress Risk.” The Journal of
Finance, 63(6), 2899–2939. doi:10.3386/w12362.
Carroll N (2016). oglmx: Estimation of Ordered Generalized Linear Models. R package version
2.0.0.1, URL https://CRAN.R-project.org/package=oglmx.
Christensen RHB (2015a). Analysis of Ordinal Data with Cumulative Link Models - Estima-
tion with the R-Package ordinal. URL https://CRAN.R-project.org/package=ordinal.
Christensen RHB (2015b). “ordinal – Regression Models for Ordinal Data.” R package version
2015.6-28, URL https://CRAN.R-project.org/package=ordinal.
Fox J (2016). polycor: Polychoric and Polyserial Correlations. R package version 0.7-9, URL
https://CRAN.R-project.org/package=polycor.
Genz A, Kenkel B (2015). pbivnorm: Vectorized Bivariate Normal CDF. R package version
0.6.0, URL https://CRAN.R-project.org/package=pbivnorm.
Greene WH, Hensher DA (2010). Modeling Ordered Choices: A Primer. Cambridge University
Press.
Harrell Jr FE (2017). rms: Regression Modeling Strategies. R package version 5.1-1, URL
https://CRAN.R-project.org/package=rms.
Hedeker D, Archer KJ, Nordgren R, Gibbons RD (2015). mixor: Mixed-Effects Ordinal Re-
gression Analysis. R package version 1.0.3, URL https://CRAN.R-project.org/package=
mixor.
Hirk R, Hornik K, Vana L (2017). mvord: An R Package for Fitting Multivariate Ordinal Re-
gression Models. R package version 0.2.1, URL https://CRAN.R-project.org/package=
mvord.
Hoetker G (2007). “The Use of Logit and Probit Models in Strategic Management Research:
Critical Issues.” Strategic Management Journal, 28(4), 331–343. doi:10.1002/smj.582.
Johnson VE, Albert J (1999). Ordinal Data Modeling. Statistics for Social Science and Public
Policy. Springer-Verlag New York Incorporated.
Kenne Pagui EC, Canale A (2016). “Pairwise Likelihood Inference for Multivariate Ordi-
nal Responses with Applications to Customer Satisfaction.” Applied Stochastic Models in
Business and Industry, 32(2), 273–282. doi:10.1002/asmb.2147.
Kenne Pagui EC, Canale A, Genz A, Azzalini A (2014). PLordprob: Multivariate Or-
dered Probit Model via Pairwise Likelihood. R package version 1.0, URL https://CRAN.
R-project.org/package=PLordprob.
Liu X (2009). “Ordinal Regression Analysis: Fitting the Proportional Odds Model Using
Stata, SAS and SPSS.” Journal of Modern Applied Statistical Methods, 8(2), 30. doi:
10.22237/jmasm/1257035340.
Malik HJ, Abraham B (1973). “Multivariate Logistic Distributions.” The Annals of Statistics,
1(3), 588–590. doi:10.1214/aos/1176342430.
Rainer Hirk, Kurt Hornik and Laura Vana 45
Martin AD, Quinn KM, Park JH (2011). “MCMCpack: Markov Chain Monte Carlo in R.”
Journal of Statistical Software, 42(9), 22. doi:10.18637/jss.v042.i09.
McCullagh P (1980). “Regression Models for Ordinal Data.” Journal of the Royal Statistical
Society. Series B (Methodological), pp. 109–142. URL http://www.jstor.org/stable/
2984952.
Nash JC (2014). “On Best Practice Optimization Methods in R.” Journal of Statistical
Software, 60(2), 1–14. doi:10.18637/jss.v060.i02.
Nash JC, Varadhan R (2011). “Unifying Optimization Algorithms to Aid Software System
Users: optimx for R.” Journal of Statistical Software, 43(9), 1–14. doi:10.18637/jss.
v043.i09.
Nooraee N, Abegaz F, Ormel J, Wit E, van den Heuvel ER (2016). “An Approximate Marginal
Logistic Distribution for the Analysis of Longitudinal Ordinal Data.” Biometrics, 72(1),
253–261. doi:10.1111/biom.12414.
Peterson B, Harrell FE (1990). “Partial Proportional Odds Models for Ordinal Response
Variables.” Journal of the Royal Statistical Society. Series C (Applied Statistics), 39(2),
205–217. doi:10.2307/2347760.
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2017). nlme: Linear and Nonlinear
Mixed Effects Models. R package version 3.1-131, URL https://CRAN.R-project.org/
package=nlme.
JMP (2018). Version 13.0. SAS Institute Inc., Cary, NC. URL http://www.jmp.com/.
Python Software Foundation (2018). Python Language Reference, Version 2.7. Wilmington,
DE. URL http://www.python.org/.
R Core Team (2018). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
SAS Institute Inc (2018). SAS/STAT Software, Version 9.4. SAS Institute Inc., Cary, NC.
URL http://www.sas.com/.
SPSS Inc (2018). IBM SPSS Statistics 24. SPSS Inc., Chicago, IL. URL http://www.spss.
com/.
Puccia M, Collett LA, Kernan P, Palmer AD, Mettrick MS, Deslondes G (2013). “Request
for Comment: Corporate Criteria.” Technical report, Standard and Poor’s Rating Services.
46 mvord: An R Package for Fitting Multivariate Ordinal Regression Models
Scott DM, Kanaroglou PS (2002). “An Activity-Episode Generation Model that Captures
Interactions between Household Heads: Development and Empirical Analysis.” Transporta-
tion Research Part B: Methodological, 36(10), 875 – 896. doi:10.1016/S0191-2615(01)
00039-X.
StataCorp (2018). Stata Statistical Software: Release 15. StataCorp LLC, College Station,
TX. URL https://www.stata.com/.
Varin C (2008). “On Composite Marginal Likelihoods.” AStA Advances in Statistical Analysis,
92(1), 1. ISSN 1863-818X. doi:10.1007/s10182-008-0060-7.
Varin C, Czado C (2009). “A Mixed Autoregressive Probit Model for Ordinal Longitudinal
Data.” Biostatistics, pp. 1–12. doi:10.1093/biostatistics/kxp042.
Varin C, Reid N, Firth D (2011). “An Overview of Composite Likelihood Methods.” Statistica
Sinica, 21(1), 5–42. URL http://www.jstor.org/stable/24309261.
Varin C, Vidoni P (2005). “A Note on Composite Likelihood Inference and Model Selection.”
Biometrika, 92(3), 519–528. doi:10.1093/biomet/92.3.519.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. 4th edition. Springer-
Verlag, New York. URL http://www.stats.ox.ac.uk/pub/MASS4.
Yee TW (2010). “The VGAM Package for Categorical Data Analysis.” Journal of Statistical
Software, 32(10), 1–34. doi:10.18637/jss.v032.i10.
Affiliation:
Rainer Hirk
Department of Finance, Accounting and Statistics
Institute for Statistics and Mathematics
WU Wirtschaftsuniversität Wien
1020 Vienna, Austria
E-mail: [email protected]
Kurt Hornik
Department of Finance, Accounting and Statistics
Institute for Statistics and Mathematics
WU Wirtschaftsuniversität Wien
1020 Vienna, Austria
Rainer Hirk, Kurt Hornik and Laura Vana 47
E-mail: [email protected]
Laura Vana
Department of Finance, Accounting and Statistics
Institute for Statistics and Mathematics
WU Wirtschaftsuniversität Wien
1020 Vienna, Austria
E-mail: [email protected]