0% found this document useful (0 votes)

62 views15 pages

Duleba1996 - Regression Analysis and Multivariate Analysis

Uploaded by

bbt.gender.galaxy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views15 pages

Duleba1996 - Regression Analysis and Multivariate Analysis

Uploaded by

bbt.gender.galaxy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Regression Analysis and

Multivariate Analysis
Antoni J. Duleba, M.D., and David L. Olive, M.D.

ABSTRACT—Proper evaluation of data does not necessarily require the use of advanced statisti
cal methods; however, such advanced tools offer the researcher the freedom to evaluate more

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

complex hypotheses. This overview of regression analysis and multivariate statistics describes
general concepts. Basic definitions and conventions are reviewed. The types of regression
analysis are then discussed, including simple regression, multiple regression, multivariate
multiple regression, and logistic regression. The various steps required to perform these analy
ses are described, and the advantages and disadvantages of each is detailed.

KEY WORDS: clinical trials, statistical analysis, regression, multivariate analysis

INTRODUCTION limitations of statistical methods intended for data

analysis. Consequently, the actual analysis of the re
Statistical analysis is an essential component of a sults may be the easiest part of the work.
vast majority of research endeavors. There are at In recent years, access to a broad range of statisti
least two major incentives to improve the under cal methods has been relatively easy, thanks to tre
standing of statistical methods: the increasing "sta mendous advances in microcomputer technology
tistical sophistication" of the scientific community and the availability of powerful statistical programs
(including the reviewers of grants and journals) and such as SAS, SPSS, or SYSTAT. To properly use these
the need to evaluate results of complex studies. programs, the users must overcome what appears to
These increasing demands and expectations behoove be two diametrically opposed adversaries: fear and
researchers to venture beyond the familiar t test or overconfidence. Fear may be instilled by forbidding
x2 test into the territory of more advanced statistical jargon and the complexities of formal-mathematical
methods. Proper evaluation of data does not neces aspects of the analysis. Overconfidence, on the other
sarily require the use of advanced methods; such hand, may arise from the deceptive simplicity of
methods are not inherently better than the basic user-friendly computer packages allowing one to
ones. However, more advanced statistical tools offer easily use and misuse powerful tests. These pitfalls
the researcher the freedom to evaluate more complex can be avoided by understanding the general con
hypotheses, and may protect from drawing false cepts of statistical methods. Furthermore, better un
conclusions imposed by the limitations of basic sta derstanding of statistical terminology may demystify
tistics. Familiarity with a wide range of statistical the manuals accompanying statistical packages and
methods is invaluable both before and after conduct improve communication with statisticians.
ing the study. Ideally, studies should be designed This overview is intended as a guide to regression
with a clear understanding of the advantages and analysis and selected aspects of multivariate analy-

Department of Obstetrics & Gynecology, Yale University School of Medicine,

New Haven, Connecticut

Reprint requests: Dr. Duleba, Dept. of Obstetrics & Gynecology, Yale University School of Medicine,
333 Cedar Street, P.O. Box 208063, New Haven, CT, 06520-8063

139
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

sis. We will attempt to describe some general con Table 1. Examples of Statistical Methods Applicable to
cepts and to stress intuitive rather than mathematical Analysis of Different Types of Variables
aspects of several statistical methods. At the outset, Dependent Variable
Independent
we will briefly review some basic definitions and Variable Nominal or Ordinal Interval or Ratio
concepts pertaining to identification of various types 2
Nominal or x test Student's t test
of variables, data collection, and statistical inference. ordinal
We will also discuss the role of type and number of Other nonparametric Analysis of variance
variables in the selection of an appropriate statistical tests
Interval or ratio Logistic regression Regression analysis
method.

DEFINITIONS AND CONVENTIONS

bias. For example, in a study evaluating patients
with polycystic ovary syndrome, fasting insulin level
Variables
may be studied as a function of free testosterone
level and age; implicit in this model is dependence
Variables may be classified as qualitative or quan
of insulin (dependent variable) on testosterone (inde
titative. Qualitative (categorical) variables may be
pendent variable). Alternatively, the same data may
nominal or ordinal. Nominal variables merely clas

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

be analyzed by modeling of free testosterone level
sify the elements of the sample to at least two catego
as a function of fasting insulin level; in this model,
ries without assigning a rank to the categories (eg,
the level of testosterone is a dependent variable,
sex or race). Ordinal variables assign order to its
while insulin is an independent variable.
categories (eg, American Fertility Society classifica
Importantly, the cause-effect (directional) relation
tion of endometriosis). Quantitative variables are al
ship between predictors and outcomes may be truly
ways numerical and may be interval or ratio. The
assessed in those studies in which independent vari
variable is interval when the interval between the
ables are under direct control of the researcher.
measures is meaningful but the ratio is not (eg, tem
When the researcher only observes and records a
perature expressed in degrees Celsius). Zero point
collection of variables, the relationship between in
on the scale of interval variable is not really meaning
dependent and dependent variables may be de
ful, eg, 0°C does not mean that there is "no tempera
scribed more appropriately in terms of association
ture/' Ratio variables are characterized by a mean
rather than cause and effect.
ingful zero point and a meaningful ratio between the
measures (eg, weight). Another important classifica
tion pertains to the relationship between variables.
In this context variables may be classified as depen Data Collection
dent or independent. Dependent variables are mea
sured responses or outcomes. Independent variables Prior to analysis, the data are usually organized
are predictors of outcomes. Independent variables in a table (spreadsheet). Typically, each row of the
may be varied by the experimenter directly (by mod spreadsheet lists data pertaining to one observation
ulating the experimental conditions) or indirectly (by or one subject, while each column lists values as
selecting subjects with various characteristics). Thus, signed to an individual variable. Such organization
in a study assessing the effect of two drugs on hirsu
tism, the independent variable would be nominal
(the selected drug), while the dependent variable Table 2. Comparison of Univariate and Multivariate
may be the Ferriman-Gallway score. Analytical Methods
Many statistical methods require clear distinction Dependent Variable(s)
between the dependent and independent variables; Single
in some settings, however, this distinction may be (univariate At least two
blurred or entirely irrelevant. Table 1 presents exam Independent Variable analysis) (multivariate analysis)
ples of statistical methods appropriate for analysis Categorical consisting Student's t test Hotel ling's T2
of various types of independent and dependent vari of two groups Discriminant function
analysis
ables. Table 2 compares univariate statistical meth Categorical consisting Analysis of Multivariate analysis
ods (evaluating a single dependent variable) with of at least three variance of variance
their multivariate counterparts (evaluating more groups Discriminant function
analysis
than one dependent variable). Quantitative or a Univariate Multivariate regression
In many settings, especially observational studies, combination of regression analysis
the assignment of a variable as dependent or inde quantitative and analysis Canonical correlation
qualitative
pendent is arbitrary and may reflect the researcher's

140
STATISTICAL ANALYSIS—Duleba, Olive

of the data facilitates computer analysis, especially variables, as well as to predict the values of depen
when the goal of the analysis is to explore the rela dent variables for a given set of values of indepen
tionship between the variables. When repeat mea dent variables.
sures are carried out on the same subject (eg, before Most commonly, responses are modeled as linear
and after treatment), a separate variable may be as functions of predictors, and hence linear regression
signed to each measurement. In this format, paired models are evaluated. The advantages of using linear
or repeat measures tests may be easily performed. models are ease of interpretation and ease of mathe
matical manipulations. Several types of regression
analysis may be distinguished:
Statistical Inference
1. Simple regression of one dependent variable
One of the major goals of statistics is to infer the and one independent variable.
characteristic of the population from observations 2. Multiple regression of one dependent vari
collected in a sample. This inference may be achieved able and several independent variables.
by the process of testing hypotheses. Typically, the 3. Multivariate multiple regression of several
null hypothesis (H0) is presented and tested by a dependent variables and several indepen
given test statistic. The null hypothesis is rejected dent variables.
when its testing demonstrates that it is improbable.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Thus, the researcher decides how stringent the test
Dependent variables in the above types of regression
ing should be, ie, what is the acceptable risk of falsely
analysis are quantitative. When the dependent vari
(by chance alone) rejecting the null hypothesis. This
able is categorical, the data may be analyzed by a
level of probability (level of significance) is called
special type of regression called logistic regression.
the alpha (a) level. The error of incorrectly rejecting
In the following section we will focus on univariate
the null hypothesis is referred to as type I error.
regression analysis encompassing simple and multi
α = P (type I error) ple linear regression analysis and logistic regression.
Multivariate multiple regression will be discussed
= P (erroneous rejecting of the null hypothesis). separately in the section devoted to multivariate
When the null hypothesis is rejected, the alternative analysis.
hypothesis (H a ) a hypothesis encompassing all alter
natives to H 0 ) is accepted. The null hypothesis may
not be rejected in two instances: when it is true, or Simple Linear Regression Analysis
when it is false but the study lacks sufficient power
to reject it. The probability P of type II error (β-error)
is the probability of not rejecting the null hypothesis The relationship between a single dependent vari
when the alternative hypothesis is true. able and a single independent variable may be
graphically approximated as a line. This line may be
β = P (type II error) described by an equation:
= P (not rejecting the null hypothesis when
the alternative hypothesis is true).
where y and x are, respectively, dependent and inde
The power of the study is equal to 1 - β and repre pendent variables. The y intercept of the line is repre
sents the probability of the test correctly rejecting the sented by the constant "β 0 "; it represents the value
null hypothesis. of the dependent variable when the independent
variable is equal to zero. The slope of the line is
represented by the unstandardized regression coeffi
REGRESSION ANALYSIS cient "β 1 "; it reflects the ratio of the change in the
dependent variable for a given change in the inde
Regression analysis explores relationships be pendent variable. Clearly, when β1 is equal to zero,
tween one or more dependent (response) variables changes in the independent variable are not associ
and one or more independent (predictor) variables. ated with any change in the dependent variable; un
Dependent and independent variables are usually, der such circumstances, the independent variable is
but not always, quantitative (ie, interval or ratio). a useless predictor of outcome. Positive value of β1
The goal of regression analysis is to express the de reflects positive correlation of predictor and out
pendent variable(s) as a function of the independent come; negative value of β1 reflects negative correla
variable(s). Such a function may be used to describe tion.
the type and the strength of association between the For example, simple linear regression may be used

141
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Figure 1. Regression analysis models. (a) Simple linear
regression. (b) Multivariate regression with a categorical
variable (country). (c) Multivariate regression with an in
teraction element (country and calendar year).

to rhodel the relationship between the age at menar- of unexplained variation, and therefore they may be
che and the calendar year (Fig. la): represented by a probabilistic relationship:

Age at menarche = β0 + β1 ∙ (calendar year). Age at menarche

— β0 + β1 ∙ (calendar year) + random error.
In this case, the unstandardized regression coeffi
cient assumes negative value: dependent variable The selection of the best fitting line is performed
(age at menarche) decreases when independent vari by the least squares approach, which identifies the
able (calendar year) increases. constant β0 and the unstandardized regression coef
The above model represents a deterministic rela ficient β1 for which the sum of squares of the errors
tionship and therefore only approximates reality; is minimized. In this context, individual errors are
real experimental data cannot be expected to fit this the differences between observed and expected val
model perfectly. Real data would contain an element ues of dependent variable.

142
STATISTICAL ANALYSIS—Duleba, Olive

The strength of the linear relationship may be mea especially when independent variables were under
sured by the Pearson product moment coefficient of direct control of the researcher.
correlation, r, which may be also referred to as a A small study, with only a few observations, may
correlation coefficient. The correlation coefficient provide an unreliable model that cannot be repro
may assume values from - 1 (perfect negative linear duced upon repeat sampling. In particular, models
relationship) through 0 (no linear relationship) to +1 derived from small studies should be cross-vali
(perfect positive linear relationship). dated, ie, tested on a new set of observations from
A useful independent variable in the regression the same population.
model requires its correlation coefficient to be signifi Finally, it is not sufficient to demonstrate a statisti
cantly different from zero. Testing of this null hy cally significant association between an independent
pothesis may be carried out with the aid of the t and a dependent variable (ie, low probability that
statistic. regression coefficient is equal to zero) and a high
The square of the correlation coefficient is called value of the coefficient of determination r2. It is also
the coefficient of determination, r 2 . The r 2 represents essential to demonstrate the real, intuitive impor
the proportion of the total sample variability ex tance of observed association. Large and powerful
plained by the linear model. Thus, for example, an studies may demonstrate a statistically significant
r2 = .78 means that 78% of the sample variability is but clinically irrelevant association. The clinical rele
explained by the model. Statistical software packages vance of the association between the independent

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

provide a summary of the most important parame and the dependent variable may be assessed by ex
ters of linear regression: the constant, the regression amining the value of the unstandardized regression
coefficient, the probability of type I error, and the coefficient.
coefficient of determination.
Once a best fitting model has been found, it may
be used to estimate the mean value of a dependent Multiple Linear Regression Analysis
variable (and the confidence interval [CI] of the
mean) for a given value of an independent variable. Multiple linear regression is a natural extension of
Thus, for example, one may estimate the average age simple linear regression, when the dependent vari
at menarche for a given calendar year and describe able is described as a linear function of more than
the CI for this average. The CI of this average will one independent variable.
describe the boundaries within which the mean age For example, expanding on the model presented
of menarche is expected to be found upon repeated for simple linear regression, we may describe the
sampling of the population of women. age at menarche as a function of two independent
Furthermore, the model may be used to predict variables, the calendar year and the percent of body
the new individual value of a dependent variable fat:
(and the prediction interval of this value) for a given
value of an independent variable. Thus, for a given Age at menarche = β0 + β1 ∙ (calendar year)
calendar year we may also predict the expected age
+ β2 ∙ (percent of body fat).
at menarche for an individual and the CI of this
prediction. While the individual value of the age at We know that the age at menarche has decreased
menarche for a given calendar year will be the same over the recent decades. We also know the age at
as the estimate of the mean, the CI will be wider. menarche correlates negatively with the percent of
This is intuitively obvious, since the prediction of body fat. Incorporation into the model of both inde
the new individual value carries two errors: the error pendent variables allows us to determine the relative
of estimating the mean and the random error of the predictive power of each variable, accounting for the
predicted individual value. other one. Thus, we may be able to answer questions:
which independent variable correlates best with the
outcome, and whether (accounting for the more
Pitfalls
powerful predictor) the other independent variable
Predictions of the value of a dependent variable has any additional predictive power.
are usually safer when performed by interpolation In the above equation, β0 is a constant, while β1
rather than by extrapolation, eg, the models pre and β2 are unstandardized regression coefficients.
sented in Figure 1 should not be used to predict the Each of these coefficients corresponds to the slope of
age of menarche in the year 1000. the line described in the discussion of simple linear
Regression analysis evaluates association; it does regression. The equation with two independent vari
not provide direct evidence for or against the cause- ables may be graphically represented by a plane in
and-effect relationship between variables. Such cau three-dimensional space. When the multiple linear
sation may be inferred based on the study design, regression equation contains more than two inde-

143
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

pendent variables, graphic presentation in multidi Assume that the R2 of the first model is .45 and that
mensional space cannot be readily accomplished. the R2 of the second model is 57. Thus, the second
Unstandardized regression coefficients are helpful model (ie, add a second predictor, % body fat) explains
in evaluating the practical relevance of the model. an additional 12% of variation (change of R2 by .12).
As discussed in the previous section on simple re The addition of more independent variables into
gression, it is important to determine whether un the equation always increases the value of R2; this
standardized regression coefficients have practical occurs, however, at the cost of decreasing the stabil
relevance to a given model. ity of the model. The real challenge in multiple linear
In a model containing i number of independent regression analysis is to find a meaningful and stable
variables, the overall regression hypothesis tests the model with a high R2.
global null hypothesis that:

β1 = β2, = . . . = βi = 0. Models
The alternative hypothesis states that at least one of In this section we will review some basic concepts
the regression coefficients is nonzero; this alternative of building linear regression models. In particular,
hypothesis states that the model is "useful" (but not we will discuss selection and transformation of vari
necessarily good). The null hypothesis may be tested ables, interactions between variables, and various

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

using F statistics. When the global null hypothesis approaches to building models.
is rejected, each of individual regression coefficients
may be tested for significance (ie, whether, for i ≠ Selection of Independent Variables
0, any given βi = 0).
The most important step in regression analysis is
When evaluating a given model, we may want to
the appropriate selection of independent variables.
compare predictive power of independent variables.
The variables may be conceptually divided into two
Direct comparison of unstandardized regression co
categories: hypothetical predictors and historically
efficients cannot be done, since values of β1 and β2
important predictors. Hypothetical predictors are
are affected by the choice of units in which individ
variables not previously known (or tested) in evalu
ual variables are measured. To determine how pre
ating a given outcome. For example, we may choose
dictive a given variable is, we may assess the effect
to test the effect of a daily dose of spinach on the age
of a standardized change of an independent variable, at menarche. Testing hypothetical variables alone is
eg, a change by one standard deviation. Such stan often unrewarding, not only because the hypothesis
dardization can be accomplished by converting each may be wrong, but also because the models would
independent variable to have a mean of 0 and stan be missing other important predictors that should be
dard deviation of 1; this procedure allows determi accounted for when evaluating a new, hypothetical
nation of standardized regression coefficients (beta- predictor. Thus, even an important variable may not
weights) that can be meaningfully compared with be identified as a good predictor of outcome, unless
each other. it is stratified against (an)other, appropriate vari
In the discussion of simple linear regression, we able(s).1 The reverse of this scenario is also possible,
discussed the coefficient of determination, r2, as a whereby any predictive power of a new hypothetical
measure of the proportion of the total sample vari predictor may disappear after accounting for other
ability explained by the linear model. In the multiple variables.
regression model, we can determine the multiple co Thus, a more meaningful approach to evaluating
efficient of determination, R2, which also describes the effect of spinach on age at menarche would be
the fraction of total variability explained by the to study a model such as:
model. Furthermore, the change of R2 is also mean
Age at menarche = β0 + β1 ∙ (calendar year)
ingful. The addition of another independent variable
to the model should increase the value of R2. This + β2 ∙ (% body fat) + β3 ∙ (daily dose of spinach).
change of R2 reflects the additional contribution of Testing of a large number of variables is tempting;
the newly added variable to the overall proportion however, it carries a risk of producing models that
of the variance explained by the new, enriched cannot be reproduced on new samples of observa
model. For example, we may compare models: tions. A practical recommendation is to assure that
the sample size (the number of observations) is at least
Age at menarche = β0 + β1 ∙ (calendar year) 5 to 10 times the number of independent variables.
and
Transformations
Age at menarche
In many instances the relationship between the
= β0 + β1 ∙ (calendar year) + β2 ∙ (% body fat). dependent and independent variable is not a linear

144
STATISTICAL ANALYSIS—Duleba, Olive

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Figure 2. Examples of nonlinear relations between variables. (a) Quadratic. (b) Cuboidal. (c) Logarithmic.
(d) Exponential.

one (Fig. 2). Can we still perform linear regression? mated by a binary function. The most important step,
The answer is yes, but only when the independent the selection of threshold values for dichotomization,
variable is reexpressed. In other words, when the may be performed using modified receiver operating
outcome is better predicted by the square or loga characteristic curve analysis. 2 In this method, the
rithm of the original independent variable, we may sum of the true positive rate and true negative rate
model the outcome as a linear function of the new, is plotted as a function of different threshold values
squared, or logarithmically transformed variable. In of a continuous variable. The cutoff point for dichot
some instances, when the outcome has a complex omization corresponds to the peak of such a plot.
curved relationship with a predictor, we may use Transformations may also be performed on depen
polynomials to approximate this relationship. Poly dent variables. This may be appropriate when sev
nomials are functions: eral independent variables demonstrate the same
type of nonlinear relationship with the dependent
y = α + β1 ∙ x + β2 ∙ x2 + . . . βk ∙ xk. variable.
Reexpression of a given predictor variable as several
new variables created in this fashion may improve Residuals
the fit of the model.
Sometimes the relationship of a quantitative inde Residuals are differences between observed and
pendent variable and the outcome is best approxi predicted (estimated by the model) values of depen-

145
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

dent variable. Thus, a separate residual may be cal Table 3. Specification of the Design Variables for
culated for each observation. Inspection of residuals Country, Using Country A as the Reference
is an essential component of regression analysis; in Design Variables
Country
particular, it is helpful in identifying outliers and (Code) D1 D2 D3
improving the fit of the model.
Country A 0 0 0
Outliers are observations with large residuals; in Country B 1 0 0
other words, they are observations that do not fit Country C 0 1 0
well to the model. Outliers may have two origins: Country D 0 0 1
they may arise from errors (ie, mistakes in measuring
or documenting), or they may signify unusual obser
vations not explainable by the model. Erroneous ob
servations may be corrected or excluded. Unusual may be used in linear regression analysis with the
but correct observations may require closer scrutiny aid of coding. Coding allows expression of categori
and, possibly, a revision of the model. cal variables as one or more of so-called dummy or
Inspection of residuals can occasionally detect pre design independent variables. For example, when
viously unsuspected relationships between indepen the categorical variable is dichotomous, coding con
dent and dependent variables. For example, a pre sists of creating a single design variable D that may
viously unsuspected quadratic component of the assume values of 0 or 1.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

relationship may become obvious by noticing a U- The situation becomes slightly more complex
shaped (or inverted U-shaped) relationship in the when there are three or more categories; for a given
graph where a residual is plotted as a function of a number n of categories, we need to assign n — 1
given predictor. number of design variables. For example, we may
When the inspection of residuals suggests that a choose to study the age at menarche as a function
given observation be removed, a variable be trans of calendar year and the country of origin. When
formed, or a new variable (such as a quadratic com studying four countries, we need to create three de
ponent) added, the new, revised model should be sign variables (Table 3). The corresponding regres
identified and its residuals examined. sion model would be as follows:
Various types of residuals may be calculated. Raw
Age at menarche = β0 + β1 ∙ (D1) + β2 ∙ (D2)
residuals are expressed in the same units as original
dependent variables. Plots of raw residuals as func + β3 ∙ (D3) + β4 ∙ (calendar year).
tions of their corresponding independent variables
Thus, for the country A, the model would be reduced
may uncover nonrandom (but not yet accounted for)
to:
relationships between a given predictor and response.
Studentized residuals are scaled by dividing each Age at menarche = β0 + β1 ∙ (D1 = 0) + β2
raw residual by its estimated standard deviation.3
Studentized residuals are particularly helpful in iden × (D2 = 0) + β3 ∙ (D3 = 0) + β4 ∙ (calendar year)
tifying the outliers. Moderately large Studentized re = β0 + β4 ∙ (calendar year)
siduals greater than two standard deviations deserve
closer scrutiny; Studentized residuals greater than while for country B, the model would be reduced to:
three standard deviations point at significant outliers. Age at menarche = β0 + β1 ∙ (D1 = 1) + β2
Both raw and Studentized residuals may be calcu
lated as deleted residuals. 4 Each deleted residual × (D2 = 0) + β3 ∙ (D3 = 0) + β4 ∙ (calendar year)
represents a difference between the observed value = β0 + β1 + β4 ∙ (calendar year).
of a dependent variable and its predicted value,
when the prediction equation is derived with that Thus, for each country, we may model a relationship
observation deleted. Plots of deleted residuals vs raw between the age at menarche and the calendar year
residuals are particularly helpful in identifying outli as a separate line with the same slope but different
ers that may significantly distort the regression coef constant (y intercept). This example is illustrated in
ficients; such outliers would markedly deviate from Fig. lb.
the 45° line through the origin of the plot.
An excellent in-depth discussion of this complex Interactions
topic may be found in the monograph by Gunst and
Mason. 4 The effect of a given independent variable may
be influenced by the value of another independent
Categorical Variables variable. For example, we may easily imagine that,
Some potential predictors of outcome are inher over the last century, changes in the age at menarche
ently non-numerical, eg, sex or race. These variables were different for different countries. Let's assume

146
STATISTICAL ANALYSIS—Duleba, Olive

that the age at menarche remained constant in coun eigenvalue. Small latent roots (close to zero) and
try A, but declined in country B. Such interaction large condition indices (in the range of at least 15 to
between independent variables may be included in 30) suggest a significant multicollinearity problem.
the model (with one design variable D1 coding for
two countries):
Strategies in Model Building
Age at menarche = β0 + β1 ∙ (D1) The process of building the best possible model
+ β2 ∙ (calendar year) + β3 ∙ (D1) ∙ (calendar year) remains largely in the domain of art rather than sci
ence. It requires judgment in the selection of vari
where "(D1) ∙ (calendar year)" is a two-variable inter ables and their appropriate transformation, the
action term. Such an interaction is illustrated in Fig search for relevant interactions, and accounting for
ure 1c. multicollinearity. The list of all variables, polynomi
While interactions may improve the fit of the als, and interaction components may be long. Exces
model, they may also unnecessarily complicate it. sively complex models lose stability and become
For example, in a model with only five independent nonreproducible on new sets of observations. Thus,
variables, there are 10 possible two-variable interac the selection of the most relevant components is es
tion terms. There may also be three-variable interac sential. Several options may be considered.
tion terms (with multiplication of three independent When dealing with a relatively small set of vari

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

variables) and so on. Furthermore, interaction may ables, all possible regressions may be tested, their
be expressed in the form of other functions of two summary statistics compared, and the best fitting
or more of the independent variables, eg, their ratios. model selected. The major drawback of this method
Interactions should be included in the model when is the large number of regressions that have to be
justified by known or suspected relevance to the in evaluated: for n number of variables, there are 2n
terpretation of the model, and/or when they appear different models.
to significantly improve the fit of the model. A prob The more efficient approach to selection of the
lem with including interactions, however, is that most relevant predictors is the best subset regres
they may introduce redundancy to the model (multi- sion.5 The concept of this approach may be best pre
collinearity). sented in an example. Let us assume that we have
10 variables, and that we want to create a model
Multicollinearity with only 4 variables. The best subset regression
method will select 4 variables such that the resultant
Multicollinearity occurs when independent vari model will have the largest coefficient of determina-
ables in the model contain redundant information. tionR2. We may then select the best possible subset of
This happens when independent variables are highly 5 variables and determine whether this larger model
correlated with each other. Multicollinearity may be results in an appreciably increased R2. The procedure
responsible for unreliable models, whereby an im may be continued for successively larger subsets un
portant predictor may not be detected and thus not til increases of R2 become insignificant or, less likely,
included in the model because of its linear relation until all 10 variables are incorporated.
ship with another predictor. In other words, multi Another method of selecting the most relevant
collinearity occurs when two or more independent variables is the stepwise selection. This approach
variables contribute to the prediction of the depen uses a fixed decision rule to either add a variable to
dent variable, but the contribution of one overlaps the model or to delete it.
with that of the other(s). Models that include redun In the forward selection, at each step, one variable
dant variables tend to have unstable estimates of with the best predictive power is added to the model.
coefficient (ie, coefficients have large variances). Thus, an F statistic is calculated for each possible
Multicollinearity may be detected by examination step, and the variable associated with a highest sig
of the correlation matrix of regression coefficients.4 nificance (lowest P value) is incorporated into the
When multicollinearity is limited to two indepen model. After the first variable has been selected, the
dent variables, it may be detected when the correla next variable is selected by evaluating its predictive
tion coefficient is in the range of .70 to .80. Multi power while accounting for the effect of the first
collinearity between three or more independent variable. The process continues until either all vari
variables may not be identified by pairwise screen ables are incorporated or, more likely, an addition
ing of correlation coefficients and may require evalu of another variable fails to significantly improve the
ation of other parameters of the correlation matrix model (ie, the F statistic demonstrates a P value that
of regression coefficients: latent roots (eigenvalues) exceeds the preset threshold level of type I error).
and condition. Condition indices are square roots of The traditional threshold of P = .05 is probably too
the ratios of the largest eigenvalue to each successive stringent and may exclude important variables. 6

147
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

Thus, more lenient threshold levels of P value in the maximum value is 1. A conditional mean may be
range of .15 to .20 should be considered. modeled with the aid of logit transformation:
An alternative stepwise procedure is the backward
elimination method. The initial model consists of all E(Y|x) = P = exp(β0 + β1 ∙x)/[1 + exp((β0 + β1 ∙x)].
variables, and at each step the least predictive vari
Logit transformation may be presented as a logit
able (with the corresponding highest P value) is
function:
eliminated from the model. The process continues
until the least predictive variable is still significantly log[P/(1 - P ) ] = β0 + β1 ∙x.
predictive, ie, its corresponding P value is below the
preset threshold level. Finally, both forward and In this form, logit is a linear function of independent
backward steps may be incorporated in the stepwise variable(s) x. In general, logistic regression uses logit
procedure, whereby each forward step (an addition transformations to explore models where the depen
of a variable to the model) is followed by a backward dent variable is categorical and at least one of the
elimination (a search for a nonsignificant variable to independent variables is quantitative (ie, interval or
be removed). This process may help in identifying ratio). In most instances the dependent variable is
variables that may initially appear important but be binary. When a categorical dependent variable as
come insignificant when (an)other, subsequently sumes more than two values, one may use a more
added variable(s) are accounted for. The process re complex extension of logistic regression, polytomous

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

quires that two threshold levels of P are selected: Pe logistic regression, a topic beyond the scope of this
for entry of a variable to the model, and Pr for re review. An excellent monograph by Hosmer and
moval of a variable from the model. The threshold Lemeshow discusses the details of logistic regression
for removal must exceed the threshold for entry. For analysis, including polytomous logistic regression. 7
example, for an entry threshold set at P = .15, the Logistic regression modeling is in many ways
removal threshold may be set at P = .20. The step comparable to linear regression. It evaluates the lin
wise procedure is incorporated into most statistical earity of the relationship between the independent
packages. variables and the logit of the dependent variable. It
Since all the above approaches may be performed accepts categorical independent variables and inter
automatically by the computer program, there is a action elements. It allows transformations of vari
risk of excluding a variable that appears to be statisti ables, and it is vulnerable to multicollinearity. Fi
cally nonsignificant but is known to be important nally, it uses familiar methods of variable selection:
(historically or biologically). Such variables should all possible regressions, best subset selection, and
be forced into the model manually, ie, included in various stepwise selection procedures.
the model, even when their contribution to the Logistic regression is distinct from linear regres
model is not statistically significant. sion in several ways including the interpretation of
the coefficients, and the approach to evaluation of
the goodness of fit.
Logistic Regression Analysis One of the most useful features offered by logistic
regression is the ability to interpret coefficients (beta-
In many studies, the outcome of interest cannot coefficients) as measures of odds ratios of dichoto-
be described as a quantitative dependent variable. mous outcomes. The odds of a dichotomous outcome
For example, conditions such as pregnancy, sponta (eg, response vs no response) may be represented as
neous abortion, or development of breast cancer are a ratio of P / ( l — P), where P represents the probabil
best described by dichotomous (binary) variables ity of the response and 1 — P represents the probabil
that code for the presence or absence of a given out ity of no response. The odds ratio is the multiplica
come. Under such circumstances, the usual linear tive factor by which the odds change when the
regression model is not helpful. However, modeling independent variable is increased by one unit. Most
of the relationship between various independent programs provide an output in the form of estimated
predictors and dichotomous dependent variables coefficients β (± standard error), odds ratios (Ψ), and
may be accomplished using a related statistical 95% CI for odds ratios.
method, logistic regression. For a change of one unit of a given independent
Dichotomous outcome may be presented in the variable xi, the calculation of Ψ and 95% CI may be
form of a conditional mean "(E(Y| x)," which denotes easily accomplished from observing that:
the expected value of Y for a given value of x. For
example, a conditional mean may represent the Ψ = exp(βi)
probability of pregnancy for a female of a given age. and that the end points of 95% CI are:
For dichotomous outcomes such as pregnancy, the
minimum value of the conditional mean is 0 and the exp[βi ± 1.96 ∙ (standard error of βi)].

148
STATISTICAL ANALYSIS—Duleba, Olive

We may also easily determine the Ψ and 95% CI for P (of at least one rejection by chance alone)
a change of c units of an independent variable %{. = 1 - P (all 20 tests accepted) = 1 - (.95)20 - .64 (!).
Ψ(c) = exp(c-β i ) Thus, multiple comparisons may result in a totally
and that the end points of 95% CI are: unacceptable risk of type I error. Importantly, the
above calculation assumes independence of out
exp[c ∙ βi ± 1.96 ∙ c ∙ (standard error of βi)]. comes; such an assumption is rarely fulfilled. In the
In logistic regression, an equivalent of multiple coef above example, the levels of hormones are usually
ficient of determination R2 is McFadden's p2. McFad highly interdependent; consequently, the true type I
den's p 2 may assume values between 0 and 1; the error lies somewhere between .05 and .64. When one
closer to one, the better the fit of the model. Notably, uses multivariate analysis, the type I error is clearly
even values in the range of .2 to .4 are considered set and is not obscured by the number of variables;
satisfactory.8 thus, the researcher is protected from reading too
The evaluation of the goodness of fit may be ac much into the data.
complished using the Pearson x2 test to compare the The interdependence of outcomes is also a source
difference between the observed and predicted (fit of the second important problem with univariate
ted by the model) probabilities (of the dependent analysis: the lack of power to detect the differences
variable). The model fits well when the difference in in outcomes when these differences are obscured by

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

these probabilities is not statistically significant. A the interdependence of outcomes or, conversely,
more sophisticated approach, often provided by sta finding the differences that would disappear after
tistical software, is to perform the Hosmer-Lemes- accounting for interdependence of outcomes.
how test. 7 Ultimately, the best evaluation of the To illustrate these concerns, let us assume that we
model may be achieved by applying the model to a are testing a null hypothesis stating that in patients
new set of observations. with polycystic ovary syndrome the mean level of
luteinizing hormone (LH) is 12 mlU/mL and the
mean level of follicle-stimulating hormone (FSH) is
MULTIVARIATE ANALYSIS 4 mlU/mL. Using univariate statistics, we may con
struct a separate 95% CI for the mean of LH and
Multivariate analysis simultaneously evaluates FSH. On a graph depicting LH on the x-axis and
multiple dependent variables and accounts for the FSH on the y-axis, these 95% CIs may be jointly pre
correlation(s) between the dependent variables. In sented as a rectangle (Fig. 3). However, it is reason
this section we will compare univariate and multi able to expect that, in fact, the levels of LH and FSH
variate methods and briefly present selected aspects are correlated. Thus, the distribution of observed val
of multivariate analysis. In-depth presentation of this ues would not be rectangular, but more likely ellip
subject may be found in an excellent monograph by tical (Fig. 3). A multivariate test creates elliptical ac
Rencher.9 ceptance regions. Two special situations should be
more closely investigated: (1) acceptance by univari
ate but rejection by multivariate testing, and (2) rejec
Comparison of Univariate and Multivariate tion by univariate but acceptance by multivariate
Analysis testing. In the former situation, the observed means
of the sample would be inside the rectangle but out
Studies evaluating multiple outcomes are often an side the ellipse; in the latter situation, the observed
alyzed using univariate methods whereby each out means of the sample would be outside the rectangle,
come is evaluated separately. This approach, how but inside the ellipse. In both instances, the results
ever, has major shortcomings when compared to of multivariate analysis should be more trusted.
multivariate analysis, especially due to the errors of In summary, multivariate analysis protects from
multiple comparisons and the inability to account errors of multiple comparisons, has the power to
for intercorrelation between the outcomes. identify important predictors of outcome that may
The error of multiple comparison may be the best not be observed in analyzing each outcome sepa
illustrated by an example. Let's assume that we in rately, and protects from false identification of effects
tend to study the effects of two treatments on 20 that would lose significance after accounting for
hormones. Use of univariate analysis would require other variables.
us to perform 20 separate comparisons (eg, t tests).
If for each test we accept a 5% risk of type I error (P Hotelling's T2
= .05), the probability of finding a significant differ
ence in the levels of at least one hormone by chance The simplest situation where multivariate analy
alone would be: sis may be used applies to studies comparing two

149
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

Manova

Multivariate analysis of variance (MANOVA) is a

natural extension of analysis of variance (ANOVA)
to situations where analysis involves more than one
dependent variable. It is useful in situations where
evaluation of more than two groups is required.
In ANOVA we may test a null hypothesis that
means (of a dependent variable) for all groups are
equal. Rejection of null hypothesis means that the
means of the dependent variable are different in at
least two groups.
In MANOVA, the null hypothesis may be ex
tended to all dependent variables, ie, we may postu
late that for each dependent variable, the means for
all groups are equal. Thus, for k groups and p depen
dent variables, the null hypothesis may be expressed
as follows:

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

µ11 = µ21 = µ31 = . . . = µk1
Figure 3. Comparison of acceptance and rejection re- µ12 = µ22 =
µ32 = . . . = µk2
gions for univariate and multivariate analysis. The rectangle
represents a confidence region constructed for each depen- µ1p = µ2p = µ3p = . . . = µkp.
dent variable (luteinizing hormone [LH] and follicle-stimu-
lating hormone [FSH]) separately by univariate tests. The There are several different statistics available to test
ellipse represents a confidence region constructed by a mul- the null hypothesis in MANOVA. Probably the most
tivariate test. popular test is known as Wilks' A. In contrast to
the familiar univariate statistics (such as t test or F
statistic) the null hypothesis is rejected when the
groups with two or more dependent variables. value of A is low and approaches zero. Critical val
Such studies may be analyzed with the aid of Ho- ues of A are available in table form 10 or may be
telling's T2 test, a multivariate extension of t tests. obtained using statistical software such as SYSTAT
For example, we may analyze an imaginary
study comparing the effect of two in vitro fertiliza
tion ovulation induction protocols on two out
comes: the number of embryos and the quality of
embryos (Fig. 4). The distribution of outcomes in
each group is approximated by an ellipse. The size
and the shape of each ellipse illustrates the rela
tionship between the outcomes. In the example il
lustrated by Figure 4, outcomes correlate posi
tively in protocol 1 and negatively in protocol 2.
Centroids C1 and C2 represent vectors of means,
in this case, pairs consisting of mean number of
embryos and mean quality of embryos. Hotelling's
T2 tests may be used to evaluate the null hypothe
sis that the centroids C1 and C2 are not different.
Finding a significant value of T2 allows us to reject
the null hypothesis and to state that the centroids
are significantly different.
In the above example, two treatments may differ
in three ways: in the number of embryos only, in the
quality of embryos only, and in both the number
and the quality of embryos. Hotelling's T2 test is
unable to determine which of these three possibilities Figure 4. Comparison of two imaginary in vitro fertiliza-
takes place. This problem may be solved by discrimi tion protocols where two outcomes (dependent variables)
nant function analysis (see below). were recorded: number of embryos and quality of embryos.

150
STATISTICAL ANALYSIS—Duleba, Olive

or SAS. Other tests frequently used in MANOVA

(and found in outputs of statistical packages) include
Roy's largest root test, the Pillai statistic, and the
Lawley-Hotelling statistic.9
Why so many statistics? All these tests have the
same type I error rate, ie, when the null hypothesis
is true, they all have the same risk of rejecting it.
However, when the null hypothesis is false, these
tests differ in their ability to reject it (ie, they differ
in power). Thus, under some circumstances, some
tests will reject the null hypothesis while the others
will not. The power of each of these tests depends
on the configuration of the mean vectors in multidi
mensional space (since we are testing means of sev
eral dependent variables, we may represent them
as mean vectors). Each mean vector represents one
group. We may imagine that, for example, mean vec
tors of three groups may be collinear (in the same

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

line) or diffuse. When the mean vectors are collinear,
the most powerful is Roy's largest root test. On the
other hand, when the mean vectors are diffuse, the
Pillai statistic is most powerful. Further discussion
of this subject may be found in specialized litera
ture. 9 In practice, under most circumstances, use of Figure 5. Discriminant function optimizes the separa-
Wilks' A is recommended. tion between the groups.
The finding of significant value of Wilks' A (or
other comparable test) allows the rejection of the null
hypothesis. However, these tests, like Hotelling's T2 Graphically, two groups may be best separated by
test, are not helpful in the actual location of the a line, 1, through the points where the ellipses cross
source of the difference between the means; this may each other (Fig. 5). A line, 2, perpendicular to line
be accomplished by using discriminant function 1, is called the discriminant function; it allows the
analysis. representation of pairs of dependent variables as
simple numbers (values of discriminant function) in
such a way that the two groups may be the best
Discriminant Function Analysis separated.
In situations where there are more than two
Discriminant functions are linear combinations of groups, single discriminant function is clearly not
dependent variables that best separate groups. Thus, sufficient; evaluation of k groups requires k — 1 dis
discriminant function analysis identifies the relative criminant functions. The most powerful discriminant
contribution of each individual dependent variable function separates one of the groups from all other
to separation of groups. groups. The next discriminant function separates the
For example, to evaluate a study testing the effect second group from the remaining ones and so on.
of two in vitro fertilization ovulation induction pro These individual discriminant functions may be
tocols on the number of embryos and the quality of ranked according to their power to separate a given
embryos, one may obtain a discriminant function: group from the remaining groups. Each discriminant
function may be evaluated by Wilk's A test to deter
z = a1 ∙ number of embryos mine its significance, ie, whether it significantly con
tributes to the separation between the groups.
+ a2 ∙ quality of embryos.
In the presence of more than two dependent vari
The values of coefficients a1 and a2 are such that they ables, graphic representation of discriminant func
optimize the separation between individual ovula tions cannot be readily accomplished; however, the
tion induction protocols. Variables may be standard concepts and the calculations of the function param
ized to have means of zero and standard deviations eters remain essentially unchanged.
of one. This allows direct comparison between the However, the greater the number of variables and
individual weights. Coefficients close to zero indi groups, the less stable are the solutions. Decreased
cate that their corresponding variables provide little stability means that on cross-validation on the new
contribution to the separation between the groups. sample of observations, results of discriminant func-

151
SEMINARS IN REPRODUCTIVE ENDOCRINOLOGY Volume 14, Number 2 May 1996

tion analysis are less likely to be reproducible. The powerful test depends on the relationship between
rule of the thumb is to have at least 10 observations the dependent variables.
(subjects) per dependent variable. To limit the num These tests inform us of the presence or absence
ber of variables to the most important ones, various of a linear relationship between independent and
stepwise procedures may be used. In this context, dependent variables somewhere in the model. Ca
important variables are those that are the most help nonical correlation may be used to closer examine
ful in separating the individual groups. The results this relationship. Canonical correlation requires the
of stepwise selection should be interpreted with cau construction of the first canonical variates, linear
tion; these procedures may exclude not only unim functions for independent and dependent variables.
portant variables, but also important ones, when For example:
they are highly correlated to other variable(s) already
incorporated into the discriminant function. This sit Canonical variate A (for independent variables)
uation is analogous to that encountered with multi-
= B1A ∙ (calendar year) + β2A ∙ (percent of body fat).
collinearity of independent variables in regression
analysis. Canonical variate B (for dependent variables)
= B1B ∙ (age at menarche)
+ B2B ∙ (age at adrenarche) + B3B ∙ (age at telarche).

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Multivariate Regression Analysis
The canonical correlation method selects the best co
While MANOVA is a multivariate technique eval efficients B for both the independent and the depen
uating categorical independent variables, multivari dent variables to maximize the correlation between
ate regression analysis is a multivariate technique these two sets of variables. When the variables are
evaluating quantitative independent variables (Table standardized, their corresponding standardized co
2). Specifically, multivariate regression analysis de efficients show the contribution of individual vari
termines the strength of the linear relationship be ables to the overall relationship between indepen
tween multiple independent and multiple depen dent and dependent variables. Further analysis of
dent variables. It is therefore an extension of multiple canonical correlation allows delineation of the details
regression analysis that produces models with the of the relationship between variables. An excellent
same number of equations as the number of depen discussion of this subject may be found in the mono
dent variables. graph by Rencher.9
For example, for three dependent variables (age at
menarche, age at adrenarche, and age at telarche) we
may produce a model: WHAT IS NEXT?
Age at menarche = β0 + β1 ∙ (calendar year)
In addition to the techniques discussed in this arti
+ β2 ∙ (percent of body fat) cle, there are other methods of multivariate analysis.
These techniques allow grouping of objects/observa
Age at adrenarche = β0 + β1 ∙ (calendar year) tions according to their similarity (cluster analysis)
+ β2 ∙ (percent of body fat) or identification of the most important underlying
factors responsible for the relationship between the
Age at telarche = β0 + β1 ∙ (calendar year) variables (factor analysis).
These methods and other statistical approaches
+ β2 ∙ (percent of body fat).
may be used not only to rigidly test our hypothe
As in univariate regression models, the β's are re ses, but also, and possibly even more importantly,
gression coefficients, and the least square method to help us in generating the new hypotheses. Thus
estimates the model with the best fitting regression applied, statistical tools may also be tools of cre
coefficients. Multivariate regression analysis allows ativity.
testing whether at least one of the outcomes is corre
lated with at least one of the predictors. This testing,
the test of overall regression, evaluates the null hy REFERENCES
pothesis that at least one of the regression coeffi
cients (in the above example, β1 or β2) is equal to 1. Simpson EM: The interpretation of interaction of contingency
zero. Various tests of overall regression may be used: tables. R Stat Soc Series B 13:238-241, 1951
2. Silverberg KM, Burns WN, Olive DL, Riehl RM, Schenken
Wilks' A, Roy's largest root, Pillai, and Lawley-Ho- RS: Serum progesterone levels predict success of in vitro fer
telling. As in MANOVA, the selection of the most tilization/embryo transfer in patients stimulated with leu-

152
STATISTICAL ANALYSIS—Duleba, Olive

prolide acetate and human menopausal gonadotropins. J Clin 6. Bendel RB, Afifi AA: Comparison of stopping rules in for
Endocrinol Metabol 73:797-803, 1991 ward regression. J Am Stat Assoc 72:46-53, 1977
3. Velleman PF, Welsch RE: Efficient computing of regression 7. Hosmer DW, Lemeshow S: Applied Logistic Regression. New
diagnostics. Am Statistician 35:234-242, 1981 York, John Wiley & Sons, Inc, 1989
8. Hensher D, Johnson LW: Applied Discrete Choice Modelling.
4. Gunst RF, Mason RL: Regression Analysis and Its Application. London, Croom Helm, 1981
A Data-Oriented Approach. New York, Marcel Dekker, Inc, 9. Rencher AC: Methods of Multivariate Analysis. New York, John
1980 Wiley & Sons, Inc, 1995
5. La Motte LR: The SELECT routines: A program for identi 10. Wall FJ: The Generalized Variance Ratio or U-Statistic. Albu
fying best subset regression. Applied Stat 21:1972 querque, The Dikewood Corp, 1967

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

153

Miller Credibiliy Paper
No ratings yet
Miller Credibiliy Paper
19 pages
Univariate, Bivariate and Multivariate Statistics: Prof. Dr. Jimmy S. Luna, JD
100% (4)
Univariate, Bivariate and Multivariate Statistics: Prof. Dr. Jimmy S. Luna, JD
16 pages
Statistical Methods in Nursing
No ratings yet
Statistical Methods in Nursing
73 pages
Statistical Methods
100% (1)
Statistical Methods
4 pages
Applied Multivariate Research - Design and Interpretation P1
No ratings yet
Applied Multivariate Research - Design and Interpretation P1
60 pages
The Impact of Internet On Academic Performance of Comilla University Students
No ratings yet
The Impact of Internet On Academic Performance of Comilla University Students
38 pages
Multivariate Statistical Methods A First Course - 1st Edition Readable PDF Download
100% (20)
Multivariate Statistical Methods A First Course - 1st Edition Readable PDF Download
15 pages
Choosing Appropriate Statistical Tool - PDF
No ratings yet
Choosing Appropriate Statistical Tool - PDF
48 pages
Statistics Model Paper
100% (2)
Statistics Model Paper
27 pages
Nonparametric Test: DR - Dr. Siswanto, MSC
No ratings yet
Nonparametric Test: DR - Dr. Siswanto, MSC
44 pages
Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Edition Rebecca M. Warner Instant Download
No ratings yet
Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Edition Rebecca M. Warner Instant Download
52 pages
Inferentialstatistics 210411214248
No ratings yet
Inferentialstatistics 210411214248
102 pages
MPC Lecture - 03 (New)
No ratings yet
MPC Lecture - 03 (New)
125 pages
Introduction To Basic Statistics
100% (2)
Introduction To Basic Statistics
31 pages
SPSS 3 9 23 14
No ratings yet
SPSS 3 9 23 14
82 pages
Unit 1 - Fundamentals To Economic Analysis
No ratings yet
Unit 1 - Fundamentals To Economic Analysis
87 pages
Lesson 2 - Transportation Planning
No ratings yet
Lesson 2 - Transportation Planning
87 pages
9.bivariate Analysis
No ratings yet
9.bivariate Analysis
64 pages
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
No ratings yet
Applied Multilevel Analysis A Practical Guide For Medical Researchers Practical Guides To Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk
79 pages
Bio-Statistics Data Analysis SPSS: Introduction To AND Using
No ratings yet
Bio-Statistics Data Analysis SPSS: Introduction To AND Using
57 pages
Analysing Quantitative Data
No ratings yet
Analysing Quantitative Data
33 pages
9 Tutorial Statistics Revision
No ratings yet
9 Tutorial Statistics Revision
56 pages
A28 Aa-Statisitcs N SPSS N Presentation-Writing
No ratings yet
A28 Aa-Statisitcs N SPSS N Presentation-Writing
49 pages
Module 3 - Lesson 3.2 Quantitative Data Analysis
No ratings yet
Module 3 - Lesson 3.2 Quantitative Data Analysis
41 pages
Panel Data MOdel-5 PDF
No ratings yet
Panel Data MOdel-5 PDF
44 pages
MBA60 - 616 Techniques
No ratings yet
MBA60 - 616 Techniques
42 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
48 pages
Local Media1419236475208910846
No ratings yet
Local Media1419236475208910846
36 pages
Early Detection of University Students With Potential Difficulties
No ratings yet
Early Detection of University Students With Potential Difficulties
39 pages
AI ML Unit 4 QB
No ratings yet
AI ML Unit 4 QB
38 pages
Basic Statistic Tools
No ratings yet
Basic Statistic Tools
41 pages
Lecture 5 - Types of Analysis - HSciences - 200323
No ratings yet
Lecture 5 - Types of Analysis - HSciences - 200323
42 pages
SEM & Confidence Interval
No ratings yet
SEM & Confidence Interval
39 pages
D3 - 1.2 MODULE 1 METHODSPlanning and Immplementing TOD With 3V Approach1
No ratings yet
D3 - 1.2 MODULE 1 METHODSPlanning and Immplementing TOD With 3V Approach1
49 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
Appropriate Uses of Multivariate Analysis: James A. Hanley
No ratings yet
Appropriate Uses of Multivariate Analysis: James A. Hanley
26 pages
Walpole Ch-12 KZ
No ratings yet
Walpole Ch-12 KZ
33 pages
Wamwea Research Proposal Power Point PHD With Corrections 2.12.2024
No ratings yet
Wamwea Research Proposal Power Point PHD With Corrections 2.12.2024
27 pages
Unit 1
No ratings yet
Unit 1
24 pages
Ass 5 PR
No ratings yet
Ass 5 PR
6 pages
Second Hand Car Price Prediction
No ratings yet
Second Hand Car Price Prediction
18 pages
CPPD Statistics Impact Pharmacy Practice
No ratings yet
CPPD Statistics Impact Pharmacy Practice
29 pages
Qunatitative Analysis
No ratings yet
Qunatitative Analysis
30 pages
Engineering Math Class Note II-1
No ratings yet
Engineering Math Class Note II-1
26 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
24 pages
Data Analysis
No ratings yet
Data Analysis
24 pages
Unit Iii Data Analysis and Reporting
No ratings yet
Unit Iii Data Analysis and Reporting
27 pages
10.1 Data Analysis and Interpretation
No ratings yet
10.1 Data Analysis and Interpretation
23 pages
PPT
No ratings yet
PPT
29 pages
Determinantsof Urban Land Lease Marketinan Emerging Economy
No ratings yet
Determinantsof Urban Land Lease Marketinan Emerging Economy
22 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Quantitative Research Methods
No ratings yet
Quantitative Research Methods
18 pages
Module 3-Forecasting For Operations
No ratings yet
Module 3-Forecasting For Operations
17 pages
Introduction To Econometrics Christ University Question Bank
No ratings yet
Introduction To Econometrics Christ University Question Bank
11 pages
Lecture 1
No ratings yet
Lecture 1
16 pages
Jurnal Penelitian
No ratings yet
Jurnal Penelitian
15 pages
Factors Affecting Foreign Direct Investment in Pakistan
No ratings yet
Factors Affecting Foreign Direct Investment in Pakistan
15 pages
The Impact of Brand Reputation Brand Equity and Brand
No ratings yet
The Impact of Brand Reputation Brand Equity and Brand
15 pages
Statistical Tests
No ratings yet
Statistical Tests
10 pages
Dva 2
No ratings yet
Dva 2
13 pages
Statistical Treatment
No ratings yet
Statistical Treatment
7 pages
Choosing The Right Statistical Test
No ratings yet
Choosing The Right Statistical Test
10 pages
Univariate and Bivariate Analysis.
No ratings yet
Univariate and Bivariate Analysis.
7 pages
Articulo Sobre Estadística
No ratings yet
Articulo Sobre Estadística
7 pages
Bivariate
No ratings yet
Bivariate
8 pages
Predictive Modelling Using Linear Regression
No ratings yet
Predictive Modelling Using Linear Regression
12 pages
Ipsos Loyalty Multicollinearity
No ratings yet
Ipsos Loyalty Multicollinearity
12 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
Em (601) Report# 9
No ratings yet
Em (601) Report# 9
6 pages
ECON2280 2023-24 Common Course Outline
No ratings yet
ECON2280 2023-24 Common Course Outline
5 pages
Regression Analysis - Its Formulation and Execution in Dentistry
No ratings yet
Regression Analysis - Its Formulation and Execution in Dentistry
10 pages
COURSE TOPIC-Nures2 CM - CU7
No ratings yet
COURSE TOPIC-Nures2 CM - CU7
11 pages
Choosing The Right Statistical Test - Types & Examples
No ratings yet
Choosing The Right Statistical Test - Types & Examples
6 pages
Different Statistical Tests With Uses and Examples
No ratings yet
Different Statistical Tests With Uses and Examples
7 pages
Bivariate Analysis: Research Methodology Digital Assignment Iii
No ratings yet
Bivariate Analysis: Research Methodology Digital Assignment Iii
6 pages
Descriptive and Inferential Statistics
No ratings yet
Descriptive and Inferential Statistics
8 pages
Choosing The Right Statistical Test
No ratings yet
Choosing The Right Statistical Test
6 pages
Portfolio Optimization Using Machine Learning Techniques
No ratings yet
Portfolio Optimization Using Machine Learning Techniques
7 pages
Two Alternative Methods To Predict Amylose Content of Rice Grain
No ratings yet
Two Alternative Methods To Predict Amylose Content of Rice Grain
6 pages
WK9 - Quantifying Research Study
No ratings yet
WK9 - Quantifying Research Study
5 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
ESM 507 Statistical Analysis B
No ratings yet
ESM 507 Statistical Analysis B
3 pages
Res 510 Module 4
No ratings yet
Res 510 Module 4
5 pages
Data Analysis - Selecting A Test
No ratings yet
Data Analysis - Selecting A Test
5 pages
Inferential Statistics
No ratings yet
Inferential Statistics
3 pages
Mock Exam
No ratings yet
Mock Exam
5 pages
Guidelines For Authors: 3. Reporting Documenting Research in Scientific Articles
No ratings yet
Guidelines For Authors: 3. Reporting Documenting Research in Scientific Articles
7 pages
Business Stats
No ratings yet
Business Stats
5 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
2 pages
Concise Biostatistical Principles & Concepts: Guidelines for Clinical and Biomedical Researchers
From Everand
Concise Biostatistical Principles & Concepts: Guidelines for Clinical and Biomedical Researchers
Franklin Opara
No ratings yet

Duleba1996 - Regression Analysis and Multivariate Analysis

Uploaded by

Duleba1996 - Regression Analysis and Multivariate Analysis

Uploaded by

Regression Analysis and

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

KEY WORDS: clinical trials, statistical analysis, regression, multivariate analysis

INTRODUCTION limitations of statistical methods intended for data

Department of Obstetrics & Gynecology, Yale University School of Medicine,

DEFINITIONS AND CONVENTIONS

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Age at menarche = β0 + β1 ∙ (calendar year). Age at menarche

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Multivariate analysis of variance (MANOVA) is a

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

or SAS. Other tests frequently used in MANOVA

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

Downloaded by: University of Pennsylvania Libraries. Copyrighted material.

You might also like