Influential Observation
Influential Observation
Assessment
Various methods have been proposed
for measuring influence.[3][4] Assume
an estimated regression ,
where is an n×1 column vector for In Anscombe's quartet the two datasets on the bottom both contain
the response variable, is the n×k influential points. All four sets are identical when examined using
design matrix of explanatory variables simple summary statistics, but vary considerably when graphed. If
(including a constant), is the n×1 one point is removed, the line would look very different.
residual vector, and is a k×1 vector
of estimates of some population
parameter . Also define , the projection matrix of . Then we have the
following measures of influence:
with the i-th row of deleted, denotes the i-th value of matrix's
main diagonal. Thus DFBETA measures the difference in each parameter estimate with and
without the influential point. There is a DFBETA for each variable and each observation (if
there are N observations and k variables there are N·k DFBETAs).[5] Table shows DFBETAs
for the third dataset from Anscombe's quartet (bottom left chart in the figure):
x y intercept slope
See also
Influence function (statistics)
Outlier
Leverage
Partial leverage
Regression analysis
Cook's distance § Detecting highly influential observations
Anomaly detection
References
1. Burt, James E.; Barber, Gerald M.; Rigby, David L. (2009), Elementary Statistics for
Geographers (https://books.google.com/books?id=p7YMOPuu8ugC&pg=PA513), Guilford
Press, p. 513, ISBN 9781572304840.
2. Everitt, Brian (1998). The Cambridge Dictionary of Statistics (https://archive.org/details/camb
ridgediction00ever_0). Cambridge, UK New York: Cambridge University Press. ISBN 0-521-
59346-8.
3. Winner, Larry (March 25, 2002). "Influence Statistics, Outliers, and Collinearity Diagnostics"
(http://stat.ufl.edu/~winner/sta6127/influence.doc).
4. Belsley, David A.; Kuh, Edwin; Welsh, Roy E. (1980). Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity (https://books.google.com/books?id=GECBEUJ
VNe0C&pg=PA11). Wiley Series in Probability and Mathematical Statistics. New York: John
Wiley & Sons. pp. 11–16. ISBN 0-471-05856-4.
5. "Outliers and DFBETA" (http://www.albany.edu/faculty/kretheme/PAD705/SupportMat/DFBE
TA.pdf) (PDF). Archived (https://web.archive.org/web/20130511013229/http://www.albany.ed
u/faculty/kretheme/PAD705/SupportMat/DFBETA.pdf) (PDF) from the original on May 11,
2013.
6. Grubbs, F. E. (February 1969). "Procedures for detecting outlying observations in samples".
Technometrics. 11 (1): 1–21. doi:10.1080/00401706.1969.10490657 (https://doi.org/10.108
0%2F00401706.1969.10490657). "An outlying observation, or "outlier," is one that appears
to deviate markedly from other members of the sample in which it occurs."
7. Maddala, G. S. (1992). "Outliers" (https://books.google.com/books?id=nBS3AAAAIAAJ&pg=
PA89). Introduction to Econometrics (https://archive.org/details/introductiontoec00madd/pag
e/89) (2nd ed.). New York: MacMillan. pp. 89 (https://archive.org/details/introductiontoec00m
add/page/89). ISBN 978-0-02-374545-4. "An outlier is an observation that is far removed
from the rest of the observations."
8. Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press.
ISBN 0-521-81099-X.
Further reading
Dehon, Catherine; Gassner, Marjorie; Verardi, Vincenzo (2009). "Beware of 'Good' Outliers
and Overoptimistic Conclusions". Oxford Bulletin of Economics and Statistics. 71 (3): 437–
452. doi:10.1111/j.1468-0084.2009.00543.x (https://doi.org/10.1111%2Fj.1468-0084.2009.0
0543.x). S2CID 154376487 (https://api.semanticscholar.org/CorpusID:154376487).
Kennedy, Peter (2003). "Robust Estimation" (https://books.google.com/books?id=B8I5SP69
e4kC&pg=PA372). A Guide to Econometrics (Fifth ed.). Cambridge: The MIT Press.
pp. 372–388. ISBN 0-262-61183-X.