Robust Covariance Estimation Campbell1980
Robust Covariance Estimation Campbell1980
Author(s): N. A. Campbell
Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 29, No. 3 (1980)
, pp. 231-237
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2346896
Accessed: 07-01-2016 13:20 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].
Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series C (Applied Statistics).
http://www.jstor.org
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
Appl.Statist.(1980),
29, No. 3, pp. 231-237
RobustProceduresin Multivariate
Analysis
I: RobustCovarianceEstimation
By N. A. CAMPBELLt
ImperialCollege,London,UK
[ReceivedNovember1978. Final revisionMarch 19803
SUMMARY
Thedetectionofatypicalobservationsfrommultivariate data setscan be enhancedby
examiningprobabilityplotsofMahalanobis
squareddistancesusingrobust M-estimatesof
meansand ofcovariances, ratherthantheusualmaximum likelihood The
estimates.
weightsassociatedwiththerobustestimation can also be used to indicateatypical
observations.
For uncontaminated data,therobustestimates are similarto theusual
estimates.
A procedureforrobust
principal
component analysisisgiven;italsoindicates
atypical
observations
and providesan analysisrelatively
littleinfluenced bysuchobservations.
Keywords: ROBUST ESTIMATION; M-ESTIMATORS; OUTLIER DETECTION; PRINCIPAL COMPONENT
ANALYSIS; MULTIVARIATE NORMALITY
1. INTRODUCTION
OBSERVATIONS whichare grosslyatypicalin a singlecomponentcan oftenbe detectedby
applyingunivariatetechniquestoeachvariable.For multivariate
data,observations areoftenonly
foundto be atypicalwhenthe value foreach variableis consideredin relationto the other
variables;somevaluesfailto maintainthepatternofrelationshipsbetweenthevariablesevident
in themajorityoftheobservations.Since theperformance ofclassicalproceduresis seriously
influencedby atypicalvalues,robustmethodswhichare littleinfluencedby atypicalvalues
provide an attractivecomplementary approach. The surveypapers by Huber (1972) and
Hampel (1973) summarizethe importantmethods and resultsfromthe earlier years of
univariaterobuststudies,whileHampel (1977) and Huber (1977b) includea reviewof more
recentresults.An introductory paper by Hogg (1977) givesthe basic ideas.
The emphasisthroughoutthis paper is on the provisionof estimatesof means and of
covariancesfora singlegroupwhichare littleinfluenced by atypicalobservations, and on the
detectionofobservationshavingundueinfluence on theestimates.The underlying distribution
is assumed,aftertransformationifnecessary,
to be symmetric; a multivariate Gaussian formis
examinedin the probabilityplotting.Gnanadesikan(1977, Chapter 5) discussestransform-
ations to achieveapproximatesymmetry.
Robust M-estimationof means and covariancesis reviewedin Section 2, and its use in
conjunctionwithprobabilityplotsofassociatedMahalanobissquareddistancesis considered.
A procedureforrobustprincipalcomponentanalysisis proposedin Section3. Typicaldata sets
are examinedin Section4, whilesome generalrecommendations are givenin Section5.
2. ROBUST ESTIMATION OF MULTIVARIATE LOCATION AND SCATTER
Healy (1968) and Cox (1968) have suggestedan extensionofprobabilityplotsofunivariate
data to the multivariatesituation,by plottingthe Mahalanobis squared distance of each
observationagainsttheorderstatisticfora chi-squareddistributionwithv d.f.,wherev is the
t Now at Division of Mathematicsand Statistics,CSIRO, WesternAustralia.
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
232 APPLIED STATISTICS
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
ROBUST PROCEDURES IN MULTIVARIATE ANALYSIS I 233
generated
ratiosofrobusttousual variancesfor
Stem-and-leafplotfor multivariateGaussiandata
withsame underlyingstructureas Thais data (7 variables,60 groups).
1 01 1, 2, 2, 3, 3
1 00 0(243)b,1(19), 2(9), 3, 3, 4, 4, 5, 6, 6, 6, 7
099 0(9), 1(6), 2, 3(10), 4(7), 5(5), 6(7), 7(7), 8(13), 9(19)
0-98 1, 1,2,2,4,5,5,6,6,6,8,8,8,9,9,9
097 3, 4, 4, 7, 8, 8
096 1,2,2,2,7,9
0952, 0954, 0957, 0947, 0934, 0934, 0938, 0-927,0928, 0929, 0910, 0912,
0 912, 0-89,0-88,0-88,0-87,0 87, 0-86,0 82
a Value is 1 011
b
Value 1000 is repeated243 times.
ofthevariances
willgenerally be within2-3 percentoftheusualestimates forwell-behaved
correction
Gaussiandata,multiplicative
data.Formultivariate factors togive
canbecalculated
largesampleagreement of the expectedvalue of d2 underrobustand usual estimation.
E(d2) = v ifb, = oo. For thenon-descending
Asymptotically, influence function,
E(d2) = vCh(do; v+ 2)+ do {1-Ch (do; v)},
whereCh(.; v) denotesthecumulative distribution
chi-squared on v d.f.For the
function
influence
redescending function,
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
234 APPLIED STATISTICS
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
ROBUST PROCEDURES IN MULTIVARIATE ANALYSIS I 235
SME24F
E
-o "97
*0*9
NT
Eq 9
1311*
24*
* 25541 0
o *22 0 244423
4OWW4l12 44555652
0.4L | 1Eb1=c
b,= | | o.a|r * .ob
=2.1.25
-3.2 -1i6 00 1-6 32 -3.2 -1i6 00 16 3.2
Gaussianquantiles Gaussianquantiles
3.12-c
*
X ~~~~~~~~~*1
1**H
31
454*
* ~~~~~46*
25* FIG. 1
* *42
143
2*
0*W* *"n
0.72 * b=20,b2=1-25
-3*2 -16 0o0 1-6 3.2
Gaussianquantiles
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
236 APPLIED STATISTICS
REFERENCES
variablesnail,Thais lamellosa.
dogwinkle:ecologicalgeneticsofa morphologically
CAMPBELL,C. A. (1978).The frilled
UnpublishedPh.D. thesis,Universityof California,Davis.
Cox, D. R. (1968). Notes on some aspectsof regressionanalysis.J. R. Statist.Soc. A, 131, 265-279.
GNANADESIKAN,R. (1977). MethodsforStatisticalData AnalysisofMultivariateObservations. New York: Wiley.
HAMPEL, F. R. (1973). Robust estimation:a condensedpartialsurvey.Z. Wahr,verw.Geb.,27, 87-104.
(1974). The influencecurveand its role in robustestimation.J. Amer.Statist.Ass.,69, 383-393.
(1977).Moderntrendsinthetheoryofrobustness. ResearchReportNo. 13,SwissFederalInstitute ofTechnology,
Zurich.
HEALY, M. J. R. (1968). Multivariatenormalplotting.Appl.Statist.,17, 157-161.
HOGG, R. V. (1977). An introduction to robustprocedures.Comm.Statist.Theor.Meth.,A6, 789-794.
HUBER, P. J.(1964). Robust estimationfora locationparameter.Ann.Math. Statist.,35, 73-101.
(1972). Robust statistics:a review.Ann.Math. Statist.,43, 1041-1067.
(1977a). Robustcovariances.In StatisticalDecisionTheoryand RelatedTopicsII (S. S. Gupta and D. S. Moore,
eds),pp. 165-191.New York: AcademicPress.
(1977b).RobustStatisticalProcedures.Philadelphia:SIAM.
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions
ROBUST PROCEDURES IN MULTIVARIATE ANALYSIS I 237
This content downloaded from 140.127.23.2 on Thu, 07 Jan 2016 13:20:46 UTC
All use subject to JSTOR Terms and Conditions