0% found this document useful (0 votes)
41 views4 pages

Influential Observation

This document discusses influential observations in statistics. An influential observation is one whose removal would noticeably change the results of a statistical calculation, particularly the parameter estimates in regression analysis. Several methods are described for measuring influence, including DFBETA which measures the difference in parameter estimates with and without the observation. High leverage points and outliers are also discussed as atypical observations that can strongly influence the regression line. The bottom datasets in Anscombe's quartet provide examples of influential points and outliers.

Uploaded by

sophia787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views4 pages

Influential Observation

This document discusses influential observations in statistics. An influential observation is one whose removal would noticeably change the results of a statistical calculation, particularly the parameter estimates in regression analysis. Several methods are described for measuring influence, including DFBETA which measures the difference in parameter estimates with and without the observation. High leverage points and outliers are also discussed as atypical observations that can strongly influence the regression line. The bottom datasets in Anscombe's quartet provide examples of influential points and outliers.

Uploaded by

sophia787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Influential observation

In statistics, an influential observation


is an observation for a statistical
calculation whose deletion from the
dataset would noticeably change the
result of the calculation.[1] In particular,
in regression analysis an influential
observation is one whose deletion has
a large effect on the parameter
estimates.[2]

Assessment
Various methods have been proposed
for measuring influence.[3][4] Assume
an estimated regression ,
where is an n×1 column vector for In Anscombe's quartet the two datasets on the bottom both contain
the response variable, is the n×k influential points. All four sets are identical when examined using
design matrix of explanatory variables simple summary statistics, but vary considerably when graphed. If
(including a constant), is the n×1 one point is removed, the line would look very different.
residual vector, and is a k×1 vector
of estimates of some population
parameter . Also define , the projection matrix of . Then we have the
following measures of influence:

1. , where denotes the coefficients estimated

with the i-th row of deleted, denotes the i-th value of matrix's
main diagonal. Thus DFBETA measures the difference in each parameter estimate with and
without the influential point. There is a DFBETA for each variable and each observation (if
there are N observations and k variables there are N·k DFBETAs).[5] Table shows DFBETAs
for the third dataset from Anscombe's quartet (bottom left chart in the figure):
x y intercept slope

10.0 7.46 -0.005 -0.044

8.0 6.77 -0.037 0.019


13.0 12.74 -357.910 525.268

9.0 7.11 -0.033 0

11.0 7.81 0.049 -0.117


14.0 8.84 0.490 -0.667

6.0 6.08 0.027 -0.021


4.0 5.39 0.241 -0.209
12.0 8.15 0.137 -0.231

7.0 6.42 -0.020 0.013

5.0 5.73 0.105 -0.087

2. DFFITS - difference in fits


3. Cook's D measures the effect of removing a data point on all the parameters combined.[2]

Outliers, leverage and influence


An outlier may be defined as a data point that differs significantly from other observations.[6][7] A high-
leverage point are observations made at extreme values of independent variables.[8] Both types of atypical
observations will force the regression line to be close to the point.[2] In Anscombe's quartet, the bottom
right image has a point with high leverage and the bottom left image has an outlying point.

See also
Influence function (statistics)
Outlier
Leverage
Partial leverage
Regression analysis
Cook's distance § Detecting highly influential observations
Anomaly detection

References
1. Burt, James E.; Barber, Gerald M.; Rigby, David L. (2009), Elementary Statistics for
Geographers (https://books.google.com/books?id=p7YMOPuu8ugC&pg=PA513), Guilford
Press, p. 513, ISBN 9781572304840.
2. Everitt, Brian (1998). The Cambridge Dictionary of Statistics (https://archive.org/details/camb
ridgediction00ever_0). Cambridge, UK New York: Cambridge University Press. ISBN 0-521-
59346-8.
3. Winner, Larry (March 25, 2002). "Influence Statistics, Outliers, and Collinearity Diagnostics"
(http://stat.ufl.edu/~winner/sta6127/influence.doc).
4. Belsley, David A.; Kuh, Edwin; Welsh, Roy E. (1980). Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity (https://books.google.com/books?id=GECBEUJ
VNe0C&pg=PA11). Wiley Series in Probability and Mathematical Statistics. New York: John
Wiley & Sons. pp. 11–16. ISBN 0-471-05856-4.
5. "Outliers and DFBETA" (http://www.albany.edu/faculty/kretheme/PAD705/SupportMat/DFBE
TA.pdf) (PDF). Archived (https://web.archive.org/web/20130511013229/http://www.albany.ed
u/faculty/kretheme/PAD705/SupportMat/DFBETA.pdf) (PDF) from the original on May 11,
2013.
6. Grubbs, F. E. (February 1969). "Procedures for detecting outlying observations in samples".
Technometrics. 11 (1): 1–21. doi:10.1080/00401706.1969.10490657 (https://doi.org/10.108
0%2F00401706.1969.10490657). "An outlying observation, or "outlier," is one that appears
to deviate markedly from other members of the sample in which it occurs."
7. Maddala, G. S. (1992). "Outliers" (https://books.google.com/books?id=nBS3AAAAIAAJ&pg=
PA89). Introduction to Econometrics (https://archive.org/details/introductiontoec00madd/pag
e/89) (2nd ed.). New York: MacMillan. pp. 89 (https://archive.org/details/introductiontoec00m
add/page/89). ISBN 978-0-02-374545-4. "An outlier is an observation that is far removed
from the rest of the observations."
8. Everitt, B. S. (2002). Cambridge Dictionary of Statistics. Cambridge University Press.
ISBN 0-521-81099-X.

Further reading
Dehon, Catherine; Gassner, Marjorie; Verardi, Vincenzo (2009). "Beware of 'Good' Outliers
and Overoptimistic Conclusions". Oxford Bulletin of Economics and Statistics. 71 (3): 437–
452. doi:10.1111/j.1468-0084.2009.00543.x (https://doi.org/10.1111%2Fj.1468-0084.2009.0
0543.x). S2CID 154376487 (https://api.semanticscholar.org/CorpusID:154376487).
Kennedy, Peter (2003). "Robust Estimation" (https://books.google.com/books?id=B8I5SP69
e4kC&pg=PA372). A Guide to Econometrics (Fifth ed.). Cambridge: The MIT Press.
pp. 372–388. ISBN 0-262-61183-X.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Influential_observation&oldid=1159896875"

You might also like