Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
323 views
96 pages
Pub Multiple Regression in Practice Quantitative Appli
Regresion estadistica
Uploaded by
Mauricio Grazzini
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save Epdf.pub Multiple Regression in Practice Quantitat... For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
323 views
96 pages
Pub Multiple Regression in Practice Quantitative Appli
Regresion estadistica
Uploaded by
Mauricio Grazzini
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save Epdf.pub Multiple Regression in Practice Quantitat... For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 96
Search
Fullscreen
MULTIPLE REGRESSION IN PRACTICE NF WILLIAM D. BERRY STANLEY FELDMAN Series: Quantitative Applications in the Social Sciences 6 a SAGE UNIVERSITY PAPER 50SAGE UNIVERSITY PAPERS Series: Quantitative Applications in the Social Sciences Series Editor: Michael S. Lewis-Beck; University of lowa Editorial Consultants Richard A. Berk, Sociology, University of California, Los Angeles. Wiltiam D. Berry, Political Science, Florida State University Kenneth A. Bollen; Sociology, University of North Carolina, Chapet fail Linda B. Bourque, Public Health, University of California, Los Angeles: Jacques A. Hagenaars, Social Sciences, Tilblirg University Sally Jackson, Communications, University of Arizona ’ Richard M. Jaeger (récently deceased), Education, University of. North Carolina, Greensboro. Gary King; Department of Government, Harvard University. Roger E. Kirk, Psychology, Baylor University c Heiena Chmura Kraemer, Psychiatry and Behavioral Sciences, Stanford University Peter Marsden, Sociolagy, Harvard University: Helmut Norpoth; Political Science, SUNY, Stony Brook Frank L. Schmidt, Management and Organization, University of lowa Herbert Weisberg, Political Science, The Chio State University Publisher Sara Miller MeCune, Sage Publications, inc: y iNSTRUCTIONS TO POTENTIAL CONTRIBUTORS For guidelines on submission of a ménograph proposal to this series, please write Michael 8: Lewis-Beck, Editor Sage OAS Series Department df Political Science University of towa lowa City, 1A 52242Series / Number 07-050 MULTIPLE REGRESSION IN PRACTICE WILLIAM D. BERRY STANLEY FELDMAN University of Kentucky SAGE PUBLICATIONS The International Professional Publishers Newbury Park London New DelhiCopyright ©1985 by Sage Publications, Inc. All rights reserved, No part of this book may be reproduced of utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. For information: Sage Publications, Ine. 2455 Teller Road Thousand Oaks, California 91320 E-mail:
[email protected]
‘Sage Publications Ltd. 6 Bonhill Street London EC2A 4PU United Kingdom Sage Publications India Pvt, Ltd. M-32 Market Greater Kailash I New Delhi 110 048 India Printed in the United States of America Library of Congress Catalog Card No. 85050543 ISBN 0-8039-2054-7 “This book is printed on acid-free paper. 04 65 06 15 14 13 12 Acquiring Editor: C, Deboraly Laughton Editorial Assistant: Eileen Carr ‘When citing a university paper, please use the proper form, Remember to cite the Sage University Paper series title and include paper number. One of the following formats can be adapted (depend- ing on the style manual used): () BERRY, W.D., & FELDMAN, S, (1985). Multiple Regression in Practice. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-030, Newbury Park, CA: Sage. OR (2) Berry, W. D., & Feldman, S. (1985), Mubtiple regression in practice. (Sage University Paper series on Quantitative Applications in the Social Sciences, 07-080). Newbury Park, CA: Sage.CONTENTS Series Editor’s Introduction = 5 Introduction = 7 Acknowledgment 8g 1. 2. The Multiple Regression Model: A Review 9 Specification Error 18 Consequences of Specification Error 18 An Hlustration of Specification Error: Satisfaction with Life 22 Detecting and Dealing with Specification Error 25 . Measurement Error = 26 Consequences of Measurement Error 27 An Illustration of Measurement Error: Satisfaction with Life 31 Detecting Measurement Error 33 Dealing with Measurement Error 33 |. Multicollinearity 37 Consequences of Muiticollinearity 40 Detecting High Multicollinearity 42 An Illustration of Multicollinearity: Satisfaction with Life 44 Dealing with Multicollinearity 46 . Nonlinearity and Nonadditivity 51 Detecting Nonlinearity and Nonadditivity 53 Dealing with Nonlinearity 57 Dealing with Nonadditivity 64 Some Warnings about Nonlinear and Nonadditive Specifications 716. Heteroscedasticity and Autocorrelation 73. When Heteroscedasticity and Autocorrelation Can Be Expected B Consequences of Heteroscedasticity and Autocorrelation 7 Detecting Heteroscedasticity 78 An Illustration of Heteroscedasticity: Income and Housing Consumption 82 Dealing with Heteroscedasticity and Autocorrelation 85 7. Additional Concerns 88 Notes 90 References 92. About the Authors = 95Series Editor’s Introduction Multiple regression analysis is one of the most popular statistical estimation procedures in the social sciences. In response to this fact, we have already published two regression-related monographs in this series, Applied Regression by Michael Lewis-Beck and Interpreting and Using Regression by Christopher Achen. The former provides a basic introduction to the procedure whereas the latter examines how and under what circumstances regression is actually put to use in good social science research. In Multiple Regression in Practice, William Berry and Stanley Feldman provide a systematic treatment of many of the major prob- lems encountered in using regression analysis. Because it is likely that one or more of the assumptions of the regression model will be violated in a specific empirical analysis, the ability to know when problems exist and to take appropriate action helps to ensure the proper use of the procedure. Responding to this need for understanding, Berry and Feldman clearly and concisely discuss the consequences of violating the assumptions of the regression model, procedures for detecting when such violations exist, and strategies for dealing with these problems when they arise. The monograph thus takes the reader a long way in understanding the major problems posed—and potential solutions to those problems—when actually using multiple regression to test social science hypotheses. In order to make the presentation as accessible as possible, the monograph was written without the use of matrix alegbra. And, when- ever possible, the notation used is consistent with Lewis-Beck’s Applied | Regression. Because both the present volume and that by Achen assume | a basic level of familiarity with regression analysis, they both make excellent companion and follow-up works to Lewis-Beck’s introduction, Berry and Feldman illustrate the problems facing researchers and the solutions they offer with numerous examples from political science, sociology, and economics. Because many applications of regression in the social sciences involve analysis of samples of cases randomly drawn 5from a larger population, some of the major examples are constructed to show more clearly the properties of regression estimates derived from samples. Specifically, Berry and Feldman explain clearly the concepts of. bias and efficiency in statistical estimation—key concepts that are often confusing to students. By making use of repeated sampling from a known population, the properties of sample estimators are much easier to understand. Inshort, Multiple Regression in Practice should be a valuable aid for anyone making use of regression analysis in their research or anyone simply interested in understanding more fully this important statistical procedure. ~—Richard G. Niemi Series Co-EditorIntroduction Multiple regression analysis is an important tool for social'scientists in the analysis of nonexperimental data, When the assumptions of regression analysis are met, the coefficient estimates derived for a ran- dom sample will have many desirable properties, In the real world of research, however, one or more of these assumptions are likely to be violated. And when this occurs, the application of regression analysis may produce misleading or problematic coefficient estimates. If no useful results could be drawn when assumptions are violated, or no modifications of regression analysis could be made to deal with the violations, the attractiveness of multiple regression would be reduced significantly. Fortunately, there are means to detect when some of the assumptions are violated and procedures that can be employed to deal with the resulting problems, In Chapters 2 through 6 of this monograph, we will analyze the key assumptions of multiple regression analysis in a systematic manner. For each assumption, we will discuss the situations in which the assumption is likely to be violated, what effect violating the assumption has on the nature of coefficient estimates, how the violation can be detected in actual research, and what can be done to overcome the problems that result when the assumption is violated. We will illustrate many of our points with examples. Because in most social science applications, regression analysis will be applied to a sample of cases from a population, we will frequently illustrate the properties of regression coefficient estimators by defining a “popula- tion” of cases and drawing a large number of random samples from this population. These exampies will help show the problems that may arise when the coefficients of a regression equation are estimated from a single random sample. Although the “populations” so defined will be drawn from actual data sets, the cases will be selected to illustrate particular statistical issues. Thus, the substantive results presented in these illustrations should not be interpreted as necessarily representative of any “real-world” population.We assume that readers of this monograph have had a prior intro- duction to regression analysis comparable to the level of Lewis-Beck’s (1980) monograph, Applied Regression: An Introduction. Although we will present an introduction to the multiple regression model in Chap- ter I, this is intended as a review of fundamentals and not as acomplete intoduction to the subject. Because we are assuming nothing more than an introduction to multiple regression, we will not use matrix algebra in our presentation. We hope this will make our discussion of the appli- cation of the multiple regression model accessible to as many people as possible. Acknowledgment We would like to thank Steve Thomson for his assistance in designing computer programs to run regressions on repeated samples from a population; he saved us considerable time and effort. We are also indebted to Tse-min Lin for detection of an error in our discussion of the consequences of heteroscedasticity in an earlier printing. —W.D.B. and S.F.MULTIPLE REGRESSION IN PRACTICE WILLIAM D. BERRY Florida State University STANLEY FELDMAN SUNY at Stony Brook 1. THE MULTIPLE REGRESSION MODEL: A REVIEW In the general form of the linear regression model, the dependent variable, Y, is assumed to be a function of a set of k independent variables—X1, X2, X3,..., Xx-~-in a population. To express the modelin equation form, we use Xj to denote the value of the" observation of the variable X;. The linear regression model assumes that for each set of values for the k independent variables (Xi, Xz, ..., Xw) there is a distribution of Y; values such that the mean of the distribution is on the surface represented by the equation. EY) =a + B,Xy48)Xyy +. $B Xy py where the Greek letter coefficient a, Bi, 62, ..., Bx represent population parameters. The interpretation of these parameters is straightforward. Biis called a partial slope coefficient as it is what mathematicians call the slope of the relationship between the independent variable X; and the dependent variable Y holding all other independent variables constant. Put differently, 8,represents the change in E(Y) (the expected value of Y) associated with a one unit increase in X; when all other independent variables in the model are held constant.‘ a, on the other hand, is called the intercept, and represents geometrically the value of E(Y) where the regression surface (or plane) crosses the Y axis, or subtantively, the expected value of Y when all the independent variables equal zero.10 Each individual observation of Y; is assumed to be determined by an equation containing an error term: Y,=@46,X)4B,Xyt--- TAX; +e {12] Thus, the error term ¢ is the deviation of the value of Y; from the mean value of the distribution obtained by repeated observation of Y values for cases each with fixed values for each of the independent variables. This error term may be conceived as representing (1) the effects on Y of variables not explicitly included in the equation, and (2) a residual random element in the dependent variable. For much of this monograph it will be unnecessary to specify the model in terms of a specific observation, so for ease of notation we will often drop the subscript, j. This leaves us with a population regression equation of Y=a+B,X,+8,X%,+...76.%, +e (1.3) k aat 3 BX, +e i=t Although implicit in the way the regression equation is written, we should note that it is assumed that the relationship between E(Y) and each X; is linear, and that the effects of the k independent variables are additive, (A more detailed discussion of the meaning and implications of linearity and additivity is contained in Chapter 5.) In addition, several other assumptions must be met to be able to appropriately estimate the population parameters and conduct tests of statistical significance. They are as follows: (1) All variables must be measured at the interval level and without error. (2) For each set of values for the k independent variables (Xy, Xy,..., Xw), E(g) = 0 (Le., the mean value of the error term is 0). (3) Foreach set of values for the k independent variables, VAR (6) = o” (Le., the variance of the error term is constant). (4) For any two sets of values for the k independent variables, COV(4;, én) = 0 (i.e., the error terms are uncorrelated; thus there is no autocorrelation), (5) Foreach Xi, COV(Xi, €) = 0 (Le., each independent variable is un- correlated with the error term).i (6) There is no perfect collinearity—no independent variable is perfectly linearly related to one or more of the other independent variables in the model. (7) For each set of values for the k independent variables, ¢ is normally distributed. These are the basic assumptions of the multiple regression model; problems associated with the violation of these assumptions will form the basis of the subsequent chapters of this monograph. The problem of measurement error (assumption 1) will be considered in Chapter 3. If the variance of the error term is not constant (assumption 3), one is faced with heteroscedasticity, discussed in Chapter 6. The violation of assump- tion 4—-autocorrelation—is also considered in Chapter 6. When the independent variables are correlated with the error term (assumption 5), the result is specification error, which is dealt with in Chapter 2, The problem of multicollinearity (assumption 6) is discussed in Chapter 4. Assumption 2 states that the mean value of the error term is zero. This should be of concern only when the analyst is interested in the precise value of the intercept. If this assumption is violated, the intercept is the only coefficient of the regression model that is affected. Finally, assumption 7 states that the error term must be normally distributed. This assumption is necessary only for tests of statistical significance; its violation will have no effect on the estimation of the parameters of the regression model. It is quite fortunate that normality is not required for estimation, because it is often very difficult to defend this assumption in practice. Furthermore, even to justify tests of significance, the normality assump- tion is critical only with small samples. In large samples, we can rely on the so-called central limit theory to ensure that even if the error term is not normally distributed in the population, the sampling distribution of a partial slope coefficient estimator wili be normally distributed (see Hanushek and Jackson, 1977: 68). As Bohrnstedt and Carter (1971) have shown, regression analysis is quite robust against violations of normality and thus significance tests can be done in large samples even when this assumption cannot be justified substantively. Parameter estimation. In most situations, we are not in a position to determine the population parameters directly; instead we must estimate their values using data from a finite sample (of size n) from the popula- tion. To distinguish it from the population regression equation, the sample regression model will be written as Y¥,=a+b,X) +d, X +... +O, X, te 4]12 The most common way of estimating the values a and the bi (i= 1,2,...,4) is to employ the east squares criterion-—to use ordinary least squares (OLS) regression. To do this we find those values of a, bi, bz,...., bi that minimize the sum of the squared deviations of the observations, Y;, from the predicted values of Y, ¥;: ; 2 2H [15] where k at 3 bX, {1.6] For the bivariate model with slope b: and intercept a, Y=atb,X, te (7 the value of b; that minimizes bX, [13] ive a ss by [1.9] 3 2 2 (Kj 7X) ees mH i where y, = Yj - Yand xj = Xj ~ X. Once by is known, a can be computed from a=Y-b,X [1.10] For the general case (with k independent variables), the formulas for the parameter estimators a, bi, bz, ..., br are sufficiently complicated to re- quire matrix algebra (see Hanushek and Jackson, 1977, Chapter 5, for the general formula). Sampling error. When estimating a population parameter from a sample it is important not only to derive a specific value, but also to13 estimate the effect of sampling error on the estimate. To accomplish this, it is necessary to consider the concept of a sampling distribution for a regression coefficient. This can be most easily understood as the distri- bution of the estimates of the regression coefficient that would result if samples of a given size were drawn repeatedly from the population and the coefficient calculated for each sample. Because coefficients esti- mated from random samples will deviate from population values by varying amounts, the estimates of the coefficients from a series of random samples of a population will not be identical, but instead will distribute themselves around a mean, The estimated standard deviation of the sampling distribution of a regression coefficient is known as a standard error, and is denoted by an “s” with a subscript of the regres- sion coefficient of interest. In the bivariate case, the standard error of the slope coefficient estimator can be calculated by: 2% Yo?) 8 = - fay z (x%,- ky? Af i % Extending this to the two variable case yields formulae for a “9 = (¥,-¥)°/(n-3) eri rr x 2, FO Tex) [1.12] ZG -Wie-s) E %-X FO x) iek Finally, we can go one step further and derive a formula for the standard error of the partial slope coefficient estimator for a model with any number of independent variables: Zo = [1.13] 2%, ~ XP (RP) n-k-1) F
You might also like
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Download PDF
PDF
100% (1)
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Download PDF
52 pages
(Chapman & Hall - CRC Texts in Statistical Science) Paul Roback and Julie Legler - Beyond Multiple Linear Regression-Applied Generalized Linear Models and Multilevel Models in R-CRC Press (2020)
PDF
No ratings yet
(Chapman & Hall - CRC Texts in Statistical Science) Paul Roback and Julie Legler - Beyond Multiple Linear Regression-Applied Generalized Linear Models and Multilevel Models in R-CRC Press (2020)
437 pages
CH 03 Wooldridge 5e PPT PDF
PDF
100% (3)
CH 03 Wooldridge 5e PPT PDF
35 pages
401 Sample Solutions Manual Applied Linear Algebra 2nd Edition by Peter J. Olver, Chehrzad Shakiban
PDF
No ratings yet
401 Sample Solutions Manual Applied Linear Algebra 2nd Edition by Peter J. Olver, Chehrzad Shakiban
6 pages
Bowerman Regression CHPT 1
PDF
100% (2)
Bowerman Regression CHPT 1
18 pages
Lecture 10 Randomized Complete Block Design Last Lecture
PDF
100% (1)
Lecture 10 Randomized Complete Block Design Last Lecture
4 pages
Introduction To IBM SPSS Statistics
PDF
100% (1)
Introduction To IBM SPSS Statistics
85 pages
MiniTab Introduction
PDF
100% (1)
MiniTab Introduction
124 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
PDF
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
2 Simple Regression Model Estimation and Properties
PDF
100% (1)
2 Simple Regression Model Estimation and Properties
48 pages
Econometria Con R
PDF
No ratings yet
Econometria Con R
300 pages
Applied Multivariate Statistics in R
PDF
100% (1)
Applied Multivariate Statistics in R
562 pages
Introduction To Econometrics With R
PDF
No ratings yet
Introduction To Econometrics With R
400 pages
Applied Regression Analysis by Christer Thrane
PDF
No ratings yet
Applied Regression Analysis by Christer Thrane
203 pages
General Linear Model
PDF
No ratings yet
General Linear Model
31 pages
Intro To Econometrics With R PDF
PDF
No ratings yet
Intro To Econometrics With R PDF
392 pages
CH - 03 - Multiple Regression Analysis Estimation
PDF
No ratings yet
CH - 03 - Multiple Regression Analysis Estimation
36 pages
Applied Stochastic Processes
PDF
No ratings yet
Applied Stochastic Processes
104 pages
Applied Regression Analysis: Third Edition
PDF
0% (1)
Applied Regression Analysis: Third Edition
9 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
PDF
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
Logistic Regression A Primer
PDF
No ratings yet
Logistic Regression A Primer
94 pages
(솔루션) Probability and Stochastic Processes 2nd Roy D. Yates and David J. Goodman 2판 확률과 통계 솔루션 433 4000
PDF
100% (1)
(솔루션) Probability and Stochastic Processes 2nd Roy D. Yates and David J. Goodman 2판 확률과 통계 솔루션 433 4000
433 pages
Bio Statistics
PDF
No ratings yet
Bio Statistics
174 pages
Introduction To Econometrics With R: Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer
PDF
No ratings yet
Introduction To Econometrics With R: Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer
481 pages
Singular Value Decomposition Example
PDF
100% (2)
Singular Value Decomposition Example
5 pages
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
PDF
No ratings yet
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
648 pages
Generalised Linear Models and Bayesian Statistics
PDF
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
Iter PDF
PDF
No ratings yet
Iter PDF
400 pages
Manuel PDF
PDF
No ratings yet
Manuel PDF
503 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
PDF
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Multiple Regression
PDF
No ratings yet
Multiple Regression
20 pages
Probability and Statistics
PDF
No ratings yet
Probability and Statistics
110 pages
Simple Linear Regression
PDF
No ratings yet
Simple Linear Regression
31 pages
Numerical Tech For Interpolation & Curve Fitting
PDF
No ratings yet
Numerical Tech For Interpolation & Curve Fitting
46 pages
CH 03 Wooldridge 6e PPT Updated
PDF
No ratings yet
CH 03 Wooldridge 6e PPT Updated
36 pages
BADM 572 Module 4 Study Session 7 April 2019
PDF
No ratings yet
BADM 572 Module 4 Study Session 7 April 2019
44 pages
Linear Algebra Summary
PDF
No ratings yet
Linear Algebra Summary
80 pages
Linear Algebra
PDF
No ratings yet
Linear Algebra
31 pages
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
PDF
No ratings yet
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
55 pages
Penalized Regression
PDF
No ratings yet
Penalized Regression
19 pages
Matrices Basic Concepts
PDF
No ratings yet
Matrices Basic Concepts
14 pages
Regression
PDF
No ratings yet
Regression
46 pages
Econometrics 2
PDF
No ratings yet
Econometrics 2
128 pages
2003 Makipaa 1
PDF
No ratings yet
2003 Makipaa 1
15 pages
Running Head: Assumptions in Multiple Regression 1
PDF
No ratings yet
Running Head: Assumptions in Multiple Regression 1
14 pages
Multivariate Lineare Regression PDF
PDF
No ratings yet
Multivariate Lineare Regression PDF
68 pages
Random Variables: Complete Business Statistics, 8/e Instructor's Solutions Manual, Chapter 3
PDF
No ratings yet
Random Variables: Complete Business Statistics, 8/e Instructor's Solutions Manual, Chapter 3
33 pages
Taylor's Theorem
PDF
No ratings yet
Taylor's Theorem
4 pages
Module 3 - Multiple Linear Regression
PDF
No ratings yet
Module 3 - Multiple Linear Regression
68 pages
Assumptions in Multiple Regression
PDF
100% (1)
Assumptions in Multiple Regression
16 pages
0205019676
PDF
No ratings yet
0205019676
28 pages
An Introduction To T
PDF
No ratings yet
An Introduction To T
7 pages
Descriptive Statistics
PDF
No ratings yet
Descriptive Statistics
5 pages
Springer Texts in Statistics
PDF
No ratings yet
Springer Texts in Statistics
16 pages
Williams Et Al. - 2013 - Assumptions of Multiple Regression Correcting Two
PDF
No ratings yet
Williams Et Al. - 2013 - Assumptions of Multiple Regression Correcting Two
15 pages
CH - 03 - Multiple Regression Analysis Estimation (Autosaved)
PDF
No ratings yet
CH - 03 - Multiple Regression Analysis Estimation (Autosaved)
36 pages
3 Multiple Regression Analysis Estimation
PDF
No ratings yet
3 Multiple Regression Analysis Estimation
37 pages
HW 03 Sol
PDF
No ratings yet
HW 03 Sol
9 pages
STP531 Course Syllabus Fall2013
PDF
No ratings yet
STP531 Course Syllabus Fall2013
2 pages