Using Multivariate Statistics - 7th Edition ISBN 0134790545, 9780134790541 (FULL VERSION DOWNLOAD)
Using Multivariate Statistics - 7th Edition ISBN 0134790545, 9780134790541 (FULL VERSION DOWNLOAD)
Visit the link below to download the full version of this book:
https://medidownload.com/product/using-multivariate-statistics-7th-edition/
5.7 Complete Examples of Regression Analysis 138 6.5.4.3 Specific Comparisons and
5.7.1 Evaluation of Assumptions 139 Trend Analysis ISS
5.7.1.1 Ratio of Cases to IVs 139 6.5.4.4 Effect Size 187
5.7.12 Normality, Linearity, 6.5.5 Alternatives to ANCOVA 187
Homoscedasticity, and 6.6 Complete Example of Analysis of Covariance 189
Independence of Residuals 139 6.6.1 Evaluation of Assumptions 189
5.7.1.3 Outliers 142 6.6.1.1 Unequal 11 and Missing Oat• 189
5.7.1.4 Multicollinearity and Singularity 144 6.6.1.2 Normality 189
5.7.2 Standard Multiple Regression 144 6.6.1.3 Linearity 191
5.7.3 Sequential Regression 150 6.6.1.4 Outliers 191
5.7.4 Example of Standard Multiple 6.6.1.5 Multicollinearity and Singularity 192
Regression with Missing Values 6.6.1.6 Homogeneity of Variance 192
Multiply Imputed 154 6.6.1.7 Homogeneity of Regression 193
5.8 Comparison of Programs 162 6.6.1.8 Reliability of Covariates 193
5.8.1 ffiM SPSS Package 163 6.6.2 Analysis of Covariance 193
5.8.2 SAS System 165 6.6.2.1 Main Analysis 193
5.8.3 SYSTATSystem 166 6.6.2.2 Evaluation of Covariates 196
6.6.2.3 Homogeneity of Regression Run 196
6 Analysis of Covariance 167 6.7 Comparison of Programs 200
6.1 General Purpose and Description 167 6.7.1 IBM SPSS Package 200
6.7.2 SAS System 200
6.2 Kinds of Research Questions 170
6.7.3 SYSTAT System 200
6.2.1 Main Effects of !Vs 170
6.2.2 Interactions Among IVs 170
6.2.3 Specific Comparisons and Trend
7 Multivariate Analysis of
Analysis 170 Variance and Covariance 203
6.2.4 Effects of Covariates 170 7.1 General Purpose and Description 203
6.2.5 Effect Size 171 7 2 Kinds of Research Questions 206
6.2.6 Parameter Estimates 171 7.2.1 Main Effects ofiVs 206
6.3 Limitations to Analysis of Covariance 171 7.2.2 Interactions Among IVs 207
6.3.1 Theoretical Issues 171 7.2.3 Importance of DVs 207
6.3.2 Practical Issues 172 7.2.4 Parameter Estimates 207
6.3.2.1 Unequal Sample Sizes, Missing 7.2.5 Specific Comparisons
Data, and Ratio of Cases to IVs 172 and Trend Analysis 207
6.3.2.2 Absence of Outliers 172 7.2.6 Effect Size 208
6.3.2.3 Absence of Multicollinearity
7.2.7 Effects of Covariates 208
and Singularity 172
6.3.2.4 Normality of Sampling Distributions 173 7.2.8 Re peated-Measures Analysis
of Variance 208
6.3.2.5 Homogeneity of Variance 173
6.32.6 Linearity 173 7.3 Limitations to Multivariate Analysis
6.32.7 Homogeneity of Regression 173 of Variance and Covariance 208
6.3.2.8 Reliability of Co,•ariatcs 174 7.3.1 Theoretical Issues 208
6.4 Fundamental Equations for Analysis 7.3.2 Practica I Issues 209
of Covariance 174 7.3.2.1 Unequal Sample Sizes,
Missing Data, and Power 209
6.4.1 Sums of Squares and Cross-Products 175
7.3.2.2 Multivariate Normality 210
6.4.2 Significance Test and Effect Size 177
7.3.2.3 Absence of Outliers 210
6.4.3 Computer Analyses of Small-Sample
7.3.2.4 Homogeneity of Variance-
Example 178 Covariance Matrices 210
6.5 Some Important Issues 179 7.3.2.5 Linearity 211
6.5.1 Choosing Covariates 179 7.3.2.6 Homogeneity of Regression 211
6.5.2 Evaluation of Covariales 180 7.3.2.7 Reliability of Covariates 211
6.5.3 Test for Homogeneity of Regression 180 7.3.2.8 Absence of Multicollinearity
and Singularity 211
6.5.4 Design Complexity 181
65.4.1 Wiiliin-Subjects and Mixed 7.4 Fundamental Equations for Multivariate
Wiiliin-Between Designs 181 Analysis of Variance and Covariance 212
6.5.42 Unequal Sample Sizes 182 7.4.1 Multivariate Analysis of Variance 212
vi Contents
14.4 Fundamen tal Equations for Slructural 14.7 Compa rison of Programs 607
Equations Modeling 535 14.7.1 EQS 607
14.4.1 Covariance Algebra 535 14.7.2 LlSREL 607
14.4.2 Model Hypotheses 537 14.7.3 AMOS 612
14.4.3 Model Specification 538 14.7.4 SAS System 612
14.4.4 Model Estimation 540
14.45 Model Evaluation 543 15 Multilevel Linear Modeling 613
14.4.6 Computer Analysis of
545 15.1 General Purpose and Description 613
Small-Sample Example
555 15.2 Kinds of Research Questions 616
14.5 Some Important Issues
14.5.1 Model Identification 555 15.2.1 Croup Differences in Means 616
557 15.2.2 Croup Differences in Slopes 616
14.5.2 Estimation Techniques
14.5.2.1 Estimation Methods 15.2.3 Cross-Level Interactions 616
and Sample Size 559 15.2.4 Meta-Analysis 616
14.5.2.2 Estimation Methods 15.2.5 Relative Strength of Predictors
and Nonnormality 559 at Various Levels 617
14.5.2.3 Estimation Metllods 15.2.6 Individual and Croup Structure 617
and Dependence 559
15.2.7 Effect Size 617
14.5.24 Some Recommendations
for Choice of Estim.1tion
15.2.8 Path Analysis at Individual
Metllod 560 and Croup Levels 617
14.5.3 Assessing the Fit of the Model 560 15.2.9 Analysis of Longitudinal Data 617
14.53.1 Comparative Fit Indices 560 15.2.10 Multilevel Logistic Regression 618
14.5.3.2 Absolute Fit Index 562 15.2.11 Multiple Response Analysis 618
Contents xi
15.3 Limitations to Multilevel Linear Modeling 618 15.7.1.1 Sample Sizes, Missing
15.3.1 Theoretical Issues 618 Data, and Distributions 656
618 15.7.1.2 Outliers 659
15.3.2 Practical Issues
15.3.2.1 Sample SUe, Unequal-11, 15.7.1.3 Multicollinearity
and Singularity 659
and l\1issing Data 619
15.7.1.4 Independence of Errors:
15.3.2.2 Independence of Errors 619
lntracLlss Correlations 659
15.3.2.3 Absence of Multicollinearity
and Singularity 620 15.7.2 Multilevel Modeling 661
15.4 Fundamental Equations 620 15.8 Comparison of Programs 668
15.4.1 Intercepts-Only Model 623 15.8.1 SAS System 668
15.4.1.1 The lnlercep~y Model: 15.8.2 IBM SPSS Package 670
Level-l Equation 623 15.8.3 HLM Program 671
15.4.1.2 The Intercepts-Only Model: 15.8.4 MlwiN Program 671
Level-2 Equation 623
15.8.5 SYSTATSystem 671
15.4.1.3 Computer Analyses
of Intercepts-Only Model 624
15.4.2 Model with a First-Level Predictor 627 16 Multiway Frequency Analysis 672
15.4.2.1 Level-l Equation fora 16.1 General Purpose and Description 672
Model with a Level-l
1'1\.>dictor 627 16.2 Kinds of Resea rch Questions 673
15.4.2.2 Level-2 Equations for a 16.2.1 Associations Among Variables 673
Model with a Level-l 16.2.2 Effect on a Dependent Variable 674
Pl\.>dictor 628 16.2.3 Parameter Estimates 674
15.4.2.3 Computer Analysis of a
Model with a Level-l 16.2.4 Importance of Effects 674
Predictor 630 16.2.5 Effect Size 674
15.4.3 Model with Predictors a t First 16.2.6 Specific Comparisons and
and Second Levels 633 Trend Analysis 674
15.4.3.1 Level-l Equation for 16.3 Limitations to Multiway Frequency Analysis 675
Model with Predictors at 16.3.1 Theoretical Issues 675
Both Levels 633
16.3.2 Practical Issues 675
15.4.3.2 Level-2 Equations for
Model with Predictors 16.3.2.1 Independence 675
at Both Levels 633 16.3.2.2 Ratio of Cases to Variables 675
15.4.3.3 Computer Analyses of 16.3.2.3 Adequacy of Expected
Model with Predictors at Frequencies 675
First and Second Le,·els 634 16.3.2.4 Absence of Outliers in the
15.5 Types of ML\11 638 Solution 676
15.5.1 Repeated Measures 638 16.4 Fundamental Equations for Multiway
15.5.2 Higher-Order ML\11 642 Frequency Analysis 676
15.5.3 Latent Variables 642 16.4.1 Screening for Effects 678
16.4.1.1 Total Effect 678
15.5.4 Nonnormal Outcome Variables 643
16.4.1.2 First-Order Effects 679
15.5.5 Multiple Response Models 644
16.4.1.3 Second-Order Effects 679
15.6 Some Important Issues 644
16.4.1.4 Third-Order Effect 683
15.6.1 lntraclass Correlation 644
16.4.2 Modeling 683
15.6.2 Centering Predictors and Changes 16.4.3 Eva luation and Interpretation 685
in Their In terpretations 646
16.4.3.1 Residuals 685
15.6.3 Interactions 648 16.4.3.2 J>aramctcr Estimates 686
15.6.4 Random and Fixed Intercepts 16.4.4 Compu ter Ana lyses of Small-Sa mple
and Slopes 648 Example 690
15.6.5 Statistical Inference 651
16.5 Some Important Issues 695
15.6.5.1 Assessing Models 651
16.5.1 Hierarchical and Nonhierarchical
15.6.5.2 Tests of Individual Effects 652
Models 695
15.6.6 Effect Size 653
16.5.2 Statistical Criteria 696
15.6.7 Estimation Techniques and 16.5.2.1 Tests of Models 696
Convergence Problems 653
16.5.2.2 Tests of Individual Effects 696
15.6.8 Exploratory Model Building 654
16.5.3 Strategies for Choosing a Model 696
15.7 Complete Example of MLM 655 16.5.3.1 IBM SPSS Hll.OGLINEAR
15.7.1 Evaluation of Assumptions 656 (Hierarchial) 697
xii Contents
A.6 Matrix "Division" (Inverses and B.7 Impact of Seat Belt Law 795
Determinants) 786 B.8 The Selene Online Educational Game 796
A.7 Eigenvalues and Eigenvectors:
Procedures for Consolidating Variance Appendix C
from a Matrix 788
Statistical Tables 797
C.l Normal Curve Areas 798
Appendix B C.2 Critical Values of the t Distribution
Research Designs for Complete for a = .05 and .01, Two-Tailed Test 799
C.3 Cri tical Values of the F Distribution 800
Examples 791 C.4 Critical Values of Chi Square (r) 804
B.1 Women's Health and Drug Study 791 c.s Critical Values for Squares Multiple
B.2 Sexual Attraction Study 793 Correlation (R~ in Forward Stepwise
B.3 Learning Disabilities Data Bank 794 Selection: a = .05 805
B.4 Reaction Ttme to Identify Figures 794 C.6 Critical Values for F~1AX (S2~1AX/S2~iiN)
B.S Field Studies of Noise-Induced Sleep Distribution for a = .05 and .01 807
Disturbance 795
B.6 Clinical Trial for Primary Biliary References 808
Cirrhosis 795 Index 815
Preface
ome good things seem to go on forever: friendship and updating this book. It is d iffi-
S cult to be lieve that the firs t ed ition manuscript was typewritten, with real cu tting and
pasting. The pub lisher required a paper manuscrip t w ith numbered pages-that was
almost our downfa ll. We cou ld write a book on multivariate statistics, bu t we couldn' t get the
same numbe r of pages (abou t 1200, doub le-spaced) twice in a row. SPSS was in release 9.0,
and the o ther p rogram we d emonstrated was BMDP. There were a mere 11 chapters, of which
6 of them were describing techniques. Multilevel and structural equation modeling were not
yet ready for prime time. Logistic regression and survival analysis were not yet popular.
Ma terial new to this edition includes a redo of all SAS examples, with a p retty new output
forma t and replacement of interactive analyses that are no longer available. We've also re-run
the IBM SPSS examples to show the new ou tput format. We've tried to update the references in
all chapters, including only classic citations if they d ate prior to 2000. New work on rela tive im-
portance has been incorpora ted in multiple regression, canonical correlation, and logistic regres-
s ion analysis-complete with d emonstrations. Multiple imputation procedu res for dealing with
missing data have been updated, and we've added a new time-series example, ta king ad vantage
of an IBM SPSS expert modeler that replaces p revious tea-leaf read ing aspects of the analysis.
Our goals in writing the book remain the same as in all previous ed itions-to p resent com-
plex s tatistical procedures in a way tha t is maximally useful and accessible to researchers who
are not necessarily statisticians. We strive to be short on theory but long on conceptual under-
s tanding. The statistical packages have become increasingly easy to use, making it all the more
critical to make sure that they a re applied w ith a good understanding of what they can and
cannot do. But above all else-what does it all mean?
We have not changed the basic format underlying all of the technique chapters, now 14 of
them. We start with an overview of the technique, followed by the types of research questions
the techniques are designed to answer. We then p rovide the cautionary tale-what you need to
worry about and how to deal with those worries. Then come the fundamenta l equa tions underly-
ing the technique, which some readers truly enjoy working through (we know because they help-
fully point out any errors and/ or inconsistencies they find); but other read ers discover they can
skim (or skip) the section without any loss to their ability to conduct meaningful ana lysis of their
research. The fundamental equations are in the context of a small, made-up, usually silly data set
for which compu ter analyses are p rovided- usually IBM SPSS and SAS. Next, we delve into is-
sues surrounding the technique (such as different types of the analysis, follow-up procedures to
the main analysis, and effect size, if it is not amply covered elsewhere). Finally, we provide one or
two full-bore analyses of an actual rea l-life data set together with a Results section appropria te for
a journal. Data sets for these examples are available at www.pearsontughered.com in IBM SPSS,
SAS, and ASCTI formats. We end each technique chapter with a comparison of features available
in IBM SPSS, SAS, SYSTAT and sometimes other specialized p rograms. SYSTAT is a statis tical
package that we reluctantly had to d rop a few editions ago for lack of space.
We apologize in advance for the heft of the book; it is not our intention to line the cof-
fers of cruropractors, p h ysical therapists, acupuncturists, and the like, but there's really just so
much to say. As to our friendship, it's still going strong despite living in d ifferent cities. Art has
taken the place of creating belly dance costumes for both of us, but we remain silly in outlook,
although serious in our analysis of research.
The lineup of people to thank grows with each ed ition, far too extensive to lis t: students,
reviewers, ed itors, and readers who send us corrections and point ou t areas of confusion. As
always, we ta ke full responsibility for remaining errors and lack of clarity.
Barbnrn G. Tabachnick
Linda S. Fidel/
xiv
Chapter 1
Introduction
Learning Objectives
1.1 Explain the importance of multivariate techniques in analyzing research
data
1.2 Describe the basic statistical concepts used in multivariate analysis
1.3 Explain how multivariate analysis is used to determine relationships
between variables
1.4 Summarize the factors to be considered for the selection of variables in
multivariate analysis
1.5 Summarize the importance of statistical power in research study design
1.6 Describe the types of data sets used in multivariate statistics
1.7 Outline the organization of the text
1
Chapter 17 attempts to foster such insights.
2 Chapter 1
In psychology, for example, we are less and less enamored of the simple, clean, laboratory
s tudy, in which pliant, first-yea r college s tudents each provide us with a single behavioral mea-
sure on cue.
In nonexperimental (correla tional or survey) research, the levels of the IV(s) are not ma-
nipulated by the researcher. The researcher can define the IV, bu t has no con trol over the
assignment of cases to levels of it. For example, groups of people may be categorized in to geo-
graphic area of residence (Northeast, Midwest, etc.), but only the definition of the variable is
under researcher control. Except for the military or p rison, place of residence is rarely s ubject
to manip ulation by a researcher. Nevertheless, a naturally occurring d ifference like this is often
considered an IV and is used to p red ict some other nonexperimental (dependen t) variable s uch
as income. In this type of research, the distinction between IVs and DVs is usually arbitrary and
many researchers prefer to call IVs predictors and DVs criterion variables.
In nonexperimental research, it is very difficult to attribu te causa lity to an IV. lf there is a
systematic d ifference in a DV associated with levels of an IV, the two va riables are said (with
some degree of confidence) to be related, but the cause of the relationship is unclear. Fo r exam-
p le, income as a DV might be rela ted to geographic area, bu t no causa l associa tion is implied.
Nonexperimenta l research takes many forms, bu t a common example is the survey.
Typically, many people are surveyed, and each respondent provides answers to many ques-
tions, producing a large number of variables. These variables are us ually interrelated in highly
complex ways, bu t univariate and bivaria te s tatistics are not sensitive to this complexity.
Bivariate correlations between all pairs of va riables, for example, cou ld not reveal tha t the 20 to
25 variables measured really represent only two or three "supervariables."
lf a research goal is to distinguish among s ubgroups in a sample (e.g., between Catholics
and Protestants) on the basis of a variety of attitudinal variables, we cou ld use severa l univari-
ate I tests (or analyses of va riance) to examine group d ifferences on each variable separately.
But if the variables are rela ted, which is highly likely, the resu lts of many t tests are misleading
and s tatistica lly suspect.
With the use of multiva riate s tatistical techniques, complex interrelationships among vari-
ables are revea led and assessed in s ta tistical inference. Further, it is possible to keep the overall
Type I error rate at, say, 5%, no matter how many variab les are tested.
Although most multivariate techniques were developed for use in nonexperimental re-
search, they are also useful in experimental research, in which there may be multiple IVs and
multiple DVs. With multiple IVs, the research is usually designed so that the IVs are indepen-
dent of each other and a straightforward correction for numerous statistica l tests is available
(see Chapter 3). With multiple DVs, a problem of inflated error ra te arises if each DV is tested
separately. Further, a t least some of the DVs are likely to be correlated w ith each o ther, so sepa-
rate tests of each DV reanalyze some of the same variance. Therefore, multivariate tests are used.
Experimenta l research designs with multiple DVs were unusual a t one time. Now, how-
ever, w ith attempts to make experimental designs more realistic, and with the availability of
computer programs, experiments often have several DVs. It is dangerous to run an experiment
with only one DV and ris k missing the impact of the IV because the most sensitive DV is not
measured. Multivaria te s tatistics help the experimenter design more efficien t and more realis tic
experiments by allowing measu rement of multip le DVs w ithout violation of acceptable levels
of Type I error.
One of the few cons idera tions not relevant to choice of statistical technique is whether the
da ta are experimen tal o r correla tional. The statis tical methods "work" whether the researcher
manipulated the levels of the IV or not. But attribution of causality to resu lts is crucially af-
fected by the experimental- nonexperimental d istinction.
If you have access to both packages, you are indeed fortuna te. Progra ms within the pack-
ages d o not completely overlap, and some p roblems a re better handled through one package
than the other. For examp le, doing severa l versions of the same basic ana lysis on the same set
of d ata is particularly easy with IBM SPSS, whereas SAS has the most extensive capabilities for
saving derived scores from d ata screening or from in termed iate ana lyses.
Chapters 5 through 17 (the chapte rs that cover the specialized multivaria te techniques)
offer explanations and illustra tions of a variety of p rograms2 within each package and a com-
parison of the features of the p rograms. We hope tha t once you understand the techniques, you
will be able to generalize to virtually any multivariate program .
Recen t versions of the programs are available in Windows, with menus tha t implemen t
most of the techniques illus trated in this book. All of the techniques may be imple mented
through syntax, and syntax itself is generated through menus. Then you may add o r change
syntax as d esired for your ana lysis. For example, you may "paste" menu choices into a
syntax window in IBM SPSS, ed it the resulting text, and then run the program . Also, syntax
genera ted by IBM SPSS menus is saved in the "journal" fi le (sta tistics.jnl), which may also
be accessed and copied into a syn tax window. Syn tax generated by SAS menus is recorded
in a "log" file. The con tents may then be copied to an inte ractive w indow, ed ited , and run .
Do not overlook the help fi les in these p rograms. Ind eed, SAS and IBM SPSS now p rovid e
the entire set of user manua ls online, often w ith more cu rrent information than is available
in printed man uals.
Ou r IBM SPSS demonstrations in this book are based on syntax generated through menus
whenever feasible. We would love to show you the sequence of menu choices, but space d oes
not permit. And, for the sake of pa rsimony, we have ed ited p rogram ou tpu t to illus trate the
mate rial that we feel is the most important for in terpretation.
With commercial computer packages, you need to know which version of the package you
are using. Programs are contin ua lly being changed, and not all changes are immed iately imple-
mented a t each facility. Therefore, man y versions of the various p rograms are sim ultaneously
in use a t different institutions; even at one ins titution, more than one vers ion of a package is
sometimes available.
Program upda tes are often corrections of errors discovered in earlier versions. Sometimes,
a new version will change the outpu t format but not its information. Occasionally, though,
there a re major revisions in one or more programs or a new program is added to the package.
Sometimes d efau lts change with upd ates, so tha t the outpu t looks different although syntax is
the same. Check to find ou t which version of each package you are using. Then, if you are us ing
a p rin ted manual, be su re that the manua l you a re using is consis tent with the vers ion in use at
you r facility. Also check updates for error correction in previous releases that may be relevant
to some of your p rev ious runs.
Except where noted, this book reviews Wind ows versions of IBM SPSS Version 24 and SAS
Version 9.4. Info rmation on availability and versions of software, macros, books, and the like
changes almost daily. We recommend the In te rnet as a sou rce of "keeping up."
2 We have retained descriptions of features of SYSTAT (Version 13) in these sections, despite the removal of
in the d ata that would be obvious if the data were p rocessed by hand a re less easy to spot
when processing is entirely by comp u ter. Bu t the compu ter packages have programs to grap h
and describe you r da ta in the simplest univariate terms and to display bivaria te relationships
a mong your variables. As discussed in Chap ter 4, these p rograms p rov ide p reliminary analy-
ses that are absolu tely necessa ry if the results of multivaria te programs are to be believed.
There a re also certain costs associated with the benefits of using multivariate p rocedu res.
Benefits of increased flexibility in research design, for ins tance, are sometimes paralleled by
increased ambiguity in interp retation of results. In add ition, multivariate results can be quite
sensitive to which ana lytic s trategy is chosen (cf. Section 1.2.4) and d o not always prov ide bet-
ter p rotection agains t statistica l errors than their univariate counterparts. Add to this the fact
that occasionally you still cannot get a firm statis tical answer to your research questions, and
you may wond er if the increase in complexity and difficulty is warranted.
Frankly, we think it is. Slip pery as some of the concepts and p rocedu res are, these statistics
p rovid e insigh ts into relationships among variables that may more closely resemble the com-
p lexity of the "real" world. And sometimes you get at least partia l answers to q uestions that
could not be asked at a ll in the univariate framework. Fo r a complete analysis, making sense of
your data usually requires a judicious mix of multivariate and univa riate statistics.
The ad dition of multivaria te statis tical methods to your repertoire makes data analysis a
lot more fun. If you liked univariate statistics, you will love multivariate statistics!3