0% found this document useful (0 votes)
46 views24 pages

Cobb 1997

The document discusses the distinctions between statistical thinking and mathematical thinking, emphasizing the importance of context in statistics. It highlights that statistics is a methodological discipline focused on understanding variability and data interpretation, while mathematics often prioritizes abstract patterns. The authors argue for the necessity of real-world examples in teaching statistics to effectively convey the interplay between data patterns and their contextual meanings.

Uploaded by

ronawarokka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views24 pages

Cobb 1997

The document discusses the distinctions between statistical thinking and mathematical thinking, emphasizing the importance of context in statistics. It highlights that statistics is a methodological discipline focused on understanding variability and data interpretation, while mathematics often prioritizes abstract patterns. The authors argue for the necessity of real-world examples in teaching statistics to effectively convey the interplay between data patterns and their contextual meanings.

Uploaded by

ronawarokka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Mathematics, Statistics, and Teaching

Author(s): George W. Cobb and David S. Moore


Source: The American Mathematical Monthly, Vol. 104, No. 9 (Nov., 1997), pp. 801-823
Published by: Mathematical Association of America
Stable URL: http://www.jstor.org/stable/2975286
Accessed: 19/09/2010 03:22

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=maa.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to
The American Mathematical Monthly.

http://www.jstor.org
Mathematics, Statistics, and Teaching
GeorgeW. Cobband David S Moore
.

How does statisticalthinkingdiffer from mathematicalthinking?What is the role


of mathematicsin statistics?If you purge statistics of its mathematicalcontent,
what intellectualsubstanceremains?
In what follows,we offer some answersto these questionsand relate them to a
sequence of examples that provide an overview of current statistical practice.
Along the way, and especiallytowardthe end, we point to some implicationsfor
the teachingof statistics.

1. INTRODUCTION: AN OVERVIEWOF STATISTICAL THINKING.Statistics


is a methodologicaldiscipline.It exists not for itself but rather to offer to other
fields of studya coherentset of ideas and tools for dealingwith data. The need for
such a discipline arises from the omnipresence of variability.
Individualsvary.
Repeated measurementson the same individualvary. In some circumstances,we
want to find unusualindividualsin an overwhelmingmass of data. In others, the
focus is on the variation of measurements.In yet others, we want to detect
systematiceffects against the backgroundnoise of individualvariation.Statistics
providesmeans for dealingwith data that take into account the omnipresenceof
variability.
1.1. The role of context. The focus on variabilitynaturallygives statisticsa particu-
lar content that sets it apartfrom mathematicsitself and from other mathematical
sciences, but there is more than just content that distinguishesstatisticalthinking
from mathematics.Statisticsrequiresa different kindof thinking,because dataare
notjustnumbers,
theyarenumbers
witha context.

Example
1. Themystery
ofAndover.The finite sequence(3, 5, 23, 37, 6, 8, 20, 22, 1, 3)
shows a distinctive pattern when plotted (Figure 1) but the numbers and the
pattern have no meaningor interest until we know their context.They are in fact
monthly totals of people formally accused of witchcraft in Essex County,
Massachusetts,beginningin February,1692. The plot shows two waves of accusa-
tions, separatedby a low point in the summerof 1692. The pattern becomes still
more meaningful when we know that the first hanging of a convicted witch
(BridgetBishop) took place June 10, 1692:it is not hard to imagine the sobering
effect of that first execution in the small community of Salem Village (now
Danvers).But why the second wave of accusations?It turns out that the accusa-
tions in the first wave were directed against residents of Salem Village, Salem
Town, and all but one of the half-dozenimmediatelyadjacenttowns;in the second
wave the majorityof the accusationswere directed against residents of the one
other adjacenttown, Andover.Our sources[3, 4] do not providemuch explanation
for what happenedin Andover,but the pattern,togetherwith what we knowof the
context,tells at least part of a story and raises some interestingquestions.

1997] MATHEMATICS,STATISTICS,AND TEACHING


801
40 -

= 30- / \
U I \
0 , I

8Y I \
o # 1

20- / \ <

S I I / \

O-
l
Jan Apr Jul Oct
Month
Figure1. Numbersof people accusedof witchcraftin Essex County,MA, 1692.

Although this first example has almost no mathematical content, its interplay
between pattern and context is typical of the interpretive part of statistical
thinking. For a more familiar example of a very different sort, consider testing that
two normal distributions have equal means.

Example 2a. A model for comparing normal means. Consider the standard model
involving two sets of independent, identically distributed (iid) random variables:

X1, X2, . . ., X, iid N( y1, v1 ) Y1,Y2,. . ., Ymiid N( F2 S 2 )

It follows that x = (Exi)/n and sl2 = E(xi-x)2/(n-1) are sufficient statistics


for ,ul and v12, with parallel results for the Ys. Informally, a statistic is sufficient
for a parameter if it uses all the information about that parameter contained in the
sample. More formally the conditional distribution of the data, given the sufficient
statistic, doesn't depend on the parameter. The Rao-Blackwell Theorem guaran-
tees that no unbiased estimator can have a smaller variance than one based on a
sufficient statistic. Both x and sl2 are unbiased: E(x)= ,ul and E(sl2)= crl2.
Finally, their joint distribution is known: the sample mean x is normal with
variance (rl2/n, and, independently, (n - l)sl2/rl2 is chi-square on (n- 1) de-
grees of freedom. Suppose now we want to test Ho ,ul = ,u2. If crl2 = 22 then a
sufficient and unbiased estimator for the common variance is obtained by pooling:

Sp2= [(n - 1)sl2 + (m - 1)522]/(n + m - 2)


If Ho is true, then (x-y)/sl(1/n) + (1/m) has a Student's t-distribution on
n + m - 2 degrees of freedom, and we can use the value of t computed from the
data to test the null hypothesis. If t is far enough from 0, we conclude that
F1 7&R2.

802 MATHEMATICS,STATISTICS,AND TEACHING [November


This example differs most strikinglyfrom the first in two ways: mathematical
content and the role of context.Example1, which has essentiallyno mathematical
content, finds its intellectual substance almost entirely in the interplaybetween
pattern and story.Example2, which has essentiallyno content apartfrom mathe-
matics, gets it intellectual substance without any explicit reference to applied
context.
Althoughmathematiciansoften rely on appliedcontextboth for motivationand
as a source of problemsfor research,the ultimatefocus in mathematicalthinking
is on abstractpatterns:the context is part of the irrelevantdetail that must be
boiled off over the flame of abstractionin order to reveal the previouslyhidden
crystal of pure structure.In mathematics,contextobscuresstructure.Like mathe-
maticians, data analysts also look for patterns, but ultimately,in data analysis,
whetherthe patternshave meaning,and whether they have any value, depends on
how the threads of those patterns interweavewith the complementarythreads of
the storyline. In data analysis,contextprovidesmeaning.
The differencehas profoundimplicationsfor teaching.To teach statisticswell,
it is not enough to understandthe mathematicaltheory;it is not even enough to
understandalso the additional,non-mathematicaltheory of statistics. One must,
like a teacher of literature,have a readysupplyof real illustrations,and know how
to use them to involve students in the developmentof their criticaljudgment.In
mathematics,where applied context is so much less important,improvisedexam-
ples often work well, and teachers of mathematicsbecome skillful at inventing
exampleson the spot (Need a function to illustratethe chain rule? No problem:
just make one up.) In statistics,however,improvisedexamplesdon'twork,because
they don't provide authentic interplay between pattern and context. Much as
BertrandRussell likened mathematicsto sculpturefor the austerityof its abstrac-
tion, one mightthinkof data analysisas like poetry,where patternand context are
inseparable. Imagine yourself teaching a lesson on basic prosody, introducing
dactylichexameter.It is not enough to say "TA ta ta, TA ta ta, TA ta ta, . . . ;"your
studentsneed to hear dactylsin a real poem [20]:"This is the forest primeval.The
murmuringpines and the hemlocks."In a similar spirit, the teacher of statistics
needs to know the data literature.If, for example,when you teach plots for data
distributions,you use data on inter-eruptiontimes for Old Faithful[30]and lengths
of reigns of Englishkings and queens [13],your studentscan learn more than just
the methods themselves.The bimodal shape of the inter-eruptiontimes suggests
two kinds of eruptions, and the distribution of monarchs' reigns shows the
skewnesstowardhigh values that is typicalof waitingtimes.
The contrastingroles of context in mathematicsand statistics, especially as
illustrated in the deliberatelyextreme first two examples, might seem to lend
support to the false implicationin Bullock's[5] assertionthat "Manystatisticians
now claim that their subject is something quite apart from mathematics,so that
statisticscoursesdo not requireany preparationin mathematics."In fact, while we
find the evidence that statisticsis not mathematicspersuasive(see [22], [24]), all
statistics courses require some preparationin mathematics,and some require a
great deal. Elaboratemathematicaltheories undergirdsome parts of statistics,and
the study of those theories is part of the training of statisticians.But although
statisticscannot prosperwithout mathematics,the conversefails. That statisticsis
not a necessarypart of a mathematician'strainingis implicit in the statementby
the eminent probabilistDavid Aldous [1] that he "is interestedin the applications
of probabilityto all scientificfields enscept
statistics."

1997] MATHEMATICS,STATISTICS,AND TEACHING 803


What then, is the role of mathematicsin the science of statistics?An answer
should begin with a more systematiclook at the logic of analyzingdata.
1.2. A schematicoverviewof statistical analysis. An old-style course that wanted
to be conscientiousabout applicationsmight finish off the second examplewith a
little coda of an exercise.The data, althoughnot this invented exercise, are from
[25];the full study is describedin [21].

Example2b. Calciumand bloodpressure.Does increasingthe amountof calciumin


our diet reduce blood pressure?The followingnumbersgive the decrease after 12
weeks in systolicblood pressurefor 21 humansubjects.The 10 subjectsin Group 1
took a calciumsupplementfor 12 weeks; the 11 in Group 2 took a placebo. Test
the hypothesisthat the calciumhad no effect on blood pressure.
Group1 (calcium): 7, - 4,18,17, - 3, - 5,1, 10,1 1, - 2
Group2(placebo): -1, 12, -1, -3, 3, -5, 5, 2, -11, -1, -3

This exercise,put so tersely,is a caricature,one that encouragesthe mistakenview


that once the mathematicalderivationsfrom a model are completed, applications
are largely a matter of routine arithmetic. For a more realistic perspective,
considerFigure2, a diagramof the stages in a statisticalanalysis.Before consider-
ing this crude outline in detail, two cautionsare essential.

(l) (2)
Design --> Data--> Patterns
Model(s)--> Methods--> Results-->Intrepretation
(3) (4) (5)

Figure2. A schematicrepresentationof the phasesof dataproductionand analysis.

1. The summaryoversimplifiesby suggestinga strictleft-to-rightprogression.In


reality, the process of data analysis is neither linear not unidirectional.
Several transitionsinvolve a dialog of sorts, sometimes between adjacent
elements, but sometimes among more than just two. Thus, for example,the
choice of designfor data productiondeterminesthe structureof the resulting
data, but knowledge based on data already in hand can help shape the
design, as when knowingthe size of variationfrom one subject to another
helps decide how many subjects will be needed. Similarly,the data may
suggest a model, but the model leads to methods that send us back to the
data to check for possible violations of the model's assumptions.Perhaps
most importantof all, as we shall see, the final stage, interpretationof the
results, depends in a crucialway on the first stage, the kind of design used
for producingthe data.
2. The rough and qualifiedorderingof stages here is not meant to suggestthat
we think the topics taught in an introductorystatisticscourse should follow
the same order. For reasons presented later, we recommendbeginningwith
methods for exploring and describing data, then going "back" to data
production,and from there to formalinference.
With these cautionsassumed,the flowchartcan providea useful frameworkfor
examiningthe role of mathematicsin statisticsand summarizingelements of the

MATHEMATICS,STATISTICS,AND TEACHING [November


804
non-mathematicalsubstanceof the subject.Here are four quick observations:
1. Design, exploration,and interpretationare core elements of statisticalthinl-
ing. All three elements are heavilydependenton context,but at the introduc-
to1ylevel they involvevery little mathematics.The (largelynon-mathemati-
cal) theory of experimentaldesign is decades old and well developed; the
theory of exploration is newer, and at present still primitive, although
computer-basedtools for explorationhave become quite sophisticated;the
theory of interpretationis fragmentaryat best.
2. The classical course in mathematicalstatistics corresponds so neatly to
transition(3) that "frommodels to methods"might almost serve as a course
title. Context is largely irrelevanthere, because models are presented ab-
stractly, as in Example 2a, and a typical derivation simply applies one
optimalityprinciple or another (least squares, maximumlikelihood) to de-
duce the method dejour.
3. Transition(4), from methodsto results,is the focus of the old-stylecookbook
course, in which each method is summarizedby a set of formulas.Contextis
irrelevanthere also, in that you can learn computationalaltorithms,and in
fact learn them more efficiently,if you resist any temptation to encumber
your brainwith concernabout what the methods are good for. All the same,
some courses have tried to make the throat-cloggingbolus of rote easier to
get down by sugar-coatingit with a thin glaze of ersatz context. Fortunately,
the computeris fast sweepingcourseslike these into the dustbinof curricular
history.
4. It is perhapsironic that transitions(3) and (4), the two that have most often
been the focus of courses at the introductorylevel, are preciselythe two that
are intellectuallymost automatic(given our current limited understanding
and less developed theory of the other transitions)and so offer the least
room for judgmentand creativity.
To developthese points in more detail,we returnto the exampleof calciumand
blood pressure.In what follows, we combine the stages of Figure 2 under three
broaderheadings:data production,data analysis,and inference.

2. THE CONTENTOF STATISTICS


2.1. Data production.The standardmodel of Example2a is incompletein a most
seriousway:it does not distinguishbetween observationaldata (e.g., from a sample
survey) and data from a randomizedcomparativeexperiment.This distinction,
between observationand experiment,is one of the most importantin statistics.
Researchersoften want to reach causal conclusions:calcium causesa reductionin
blood pressure. Experimentsoften allow causal conclusions,while observational
studies almostalwaysleave issues of causationunsettledand subjectto debate.Yet
the mathematicalmodels of statisticaltheory are identical for observationaland
experimentaldata.
The calciumstudywas in fact an experiment:

Example2c. Thedesignof the calciumstudy[21]. Examinationof a large sample of


people revealed a relationshipbetween calcium intake and blood pressure. The
relationshipwas strongest for black men. Researchers therefore conducted an
experiment.

1997] MATHEMATICS,STATISTICS,AND TEACHING 805


The subjectsin part of the experimentwere 21 healthyblack men. A randomly
chosen group of 10 of the men received a calciumsupplementfor 12 weeks. The
control group of 11 men received a placebo pill that looked identical. The
experimentwas double-blind.

Can we conclude that calciumhas caused a reductionin blood pressure?Such


an inference, that an observed difference may be taken at face value, stands on
three legs. Two of the three are groundedin data production:
(1) an argument automaticonly for randomsamples and randomizedexperi-
ments that a probabilitymodel applies to the data;
(2) an argument probability-based,and comparativelystraightforward that
the observed difference is "real,"i.e., too big to be plausiblyexplained as
due just to chance variation;and
(3) an argument often thorny and fraughtwith pitfalls, except in the case of
randomizedexperiments that the observeddifference is not due to some
confoundinginfluence distinctfrom the factor of interest.
The t-test of Example2a, like all statisticaltests and confidenceintervals,deals
only with the second argument:"If we assume that a particularchance model
applies, how likely is it to get an observed difference this big?"The other two
argumentsdepend on the design.
The clinical trial on the effect of calciumon blood pressurewas a randomized
comparativeexperiment.Figure 3 presents the design in outline form. The great
virtue of assigningthe subjectsat randomis that it makes arguments(1) and (3)
automatic,and so reduces the problemof inferringcause to checkingthe fit of a
model, and then, given adequatefit, carryingout a straightforward calculation.The
randomassignmentof subjectseliminatesbias in formingthe treatmentgroupsand
produces groups that differ only through chance variation before we apply the
treatments.The comparativedesignremindsus that all subjectsare treated exactly
alike except for the contents of the pills they take. Thus if we observedifferences
in the mean reductionin blood pressuregreaterthan could be expectedto arise by
chance,we can be confidentthat the calciumbroughtabout the effect we see.

Group1 _ Treatment1
, 10 patients Calcium X
Random Compare
Allocation X BloodPressure
Group2 Treatment2 /
11 patients Placebo

Figure3. The simplestrandomizedcomparativeexperiment.

The other majormeans of producingdata are samplesurveysthat choose and


examine a sample in order to produce informationabout a larger population.
Interesting examples abound opinion polls sound and unsound, government
collectionof economicand social data, academicdata sourcessuch as the National
Opinion Research Center at the University of Chicago. Statistical designs for
samplingbegin by insistingthat impersonalchance should choose the sample.The
central idea of statisticaldesigns for producingdata, througheither samplingor
experimentation,is the deliberate use of chance. Explicit use of chance mecha-
nisms eliminates some major sources of bias. It also ensures that quite simple

806 MATHEMATICS,STATISTICS AND TEACHING [November


probabilitymodels describe our data production processes, and therefore that
standard inference methods apply. However, unlike randomized experiments,
observationalstlldies do not lend themselves in so straightforwarda way to an
inference of causation,as the followingexampleshows.The originalstudyby Best
and Walkerappearsas an examplein [12];our presentationhere follows [26].

E:xample 3. Smokingand health. One of the earlyobservationalstudies of smoking


and health comparedmortalityrates for three groupsof men. The rates, in deaths
per year per 1000 men, were:
Non-smokers20.2 Cigarettesmokers20.5, Cigarand pipe smokers35.5.
To test whether the observeddifferencesmight be due to chance, we could use a
model similarto the one in Example2a. The sample sizes were so large that we
can easily rule out chancevariationas an explanationfor the obsemed differences,
leavingus with the apparentconclusionthat cigarettespose little risk but pipes or
cigarsor both are quite dangerous.Indeed, that conclusionwould be valid if these
data had come from a randomized,controlled double-blindexperimentlike the
calcium study. However the premise is clearly untenable. Because this is an
obsen7ationalstudy,we need to ask about other factors,linked to smokinghabits,
that might be responsiblefor the obserfireddifference.Here, age is the main such
factor: pipe and cigar smokerstend to be older than cigarette smokers,and the
risk of death increases with age. In this study, the average ages for the three
groupswere:
Non-smokers54.9 years, Cigarettesmokers50.5 years,
Cigarand pipe smokers65.9 years.
Only after adjustingthe death rates for the differencesin age do we get numbers
more in line with what we have come to expect:
Non-smokers20.3, Cigarettesmokers28.3, Cigarand pipe smokers21.2.

Taken together, the last two examplesoffer what we consider two of the most
importantlessons for mathematicianswho teach statistics:one, the conclusions
from a study depend cruciallyon how the data were produced, and twoSthe
standardmathematicalmodels ignore data production.
Statisticalideas for producingdata to answer specific questions are the most
influential contributionsof statistics to human knowledge. Badly designed data
productionis the most common serious flaw in statisticalstudies. Well designed
data productionallows us to apply standardmethods of analysisand reach clear
conclusions. Professional statisticians are paid for their expertise in designing
studies;if the studyis well designed(and no unanticipateddisasteroccurred),you
don't need a professionalto do the analysis.In other words, the design of data
production is really important. If you just say s4SupposeX1 to Xn are iid
observations,'vyou aren'tteachingstatistics.
2.2. Data analysis:explorationand description Data analysisis the contemporary
form of 44descriptivestatistics,S'
powered by more numerous and more elaborate
descriptivetools, but especially by a philosophy due in large measure to John
Tukeyof Bell Labs and Princeton.The philosophyis capturedin the now-common
name, exploratozy data analysas,or EDA. The goal of EDA is to see what the data
in hand say, on the analogyof an explorerenteringunknownlands. We put aside
(but not forever) the issue of whether these data represent any larger universe.

1997] MATHEMATICS,STATISTICS,AND TEACHING


807
Table 1 presentsan elementarysummary[25]of the distinctionsbetween EDA and
standardinference:
TABLE1. EXPLORATORY
DATAANALYSIS
VS.FORMAL
PROBABILITY-BASED
INFERENCE
Exploratory Data Analysis Statistical Inference
Purpose is unrestricted exploration Purpose is to answer specific
of the data, searching for questions, posted before the
interesting patterns. data were produced

Conclusions apply only to the Conclusions apply to a larger group


individuals and circumstances for of individuals or a broader class
which we have data in hand of circumstances
Conclusions are informal, based on Conclusions are formal, backed by
what we see in the data. a statement of our confidence in them

In practice, exploratoryanalysis is a prerequisite to formal inference. Most


real data contain surprises,some of which can invalidateor force modificationof
the inference that was planned. This is one reason why runningdata through a
sophisticated (and therefore automated) inference procedure before exploring
them carefullyis the mark of a statisticalnovice. The dialog between data and
models continueswith more advanceddiagnostictools that allow data to criticize
specificmodels.These tools combinethe EDA spiritwith the resultsof mathemat-
ical analysisof the consequencesof the models.
As we have already seen, the model of Example 2a, because it does not
distinguishbetween observationand experiment,is incomplete.It is also, like most
idealized mathematicalmodels for real phenomena, unrealistic. In the words
attributedto the statisticianGeorge Box, "All models are wrong, but some are
useful."The user of inferencemethodsbased on this model must carefullyexplore
its adequacyto the setting and the data. Were there flaws in the data production
(whethersample or experiment)that render inferencemeaningless?Are the data,
which are certainlynot independentobservationson a perfectlynormal distribu-
tion, sufficientlynormal to allow use of standardprocedures?This question is
answered by exploratoryexamination of the data themselves, combined with
knowledge of how "robust"the planned analysis is under deviations from the
assumptionsof the model.
Example2d. Preliminaryexplorationof the calcium data. An analysis might start
from a simple outline:plot, shape, center, spread.
Plot. A stemplotsplits each data value into a stem and leaf, then sorts leaves onto
shared stems. Figure 4 shows a back-to-backstemplot useful for comparingtwo
groups:
Placebo Calcium
1 - 1
5 -O 5
33111 - O 234
43 O 1
5 0 7
2 1 01
1 78

Figure4. Parallelstemplotof reductionin systolicblood pressurefor two groupsof men.

808 MATHEMATICS,STATISTICS,AND TEACHING [November


T

Shape.The distributionfor the placebo group is unimodal and symmetric.The


treatmentgroup, however,contains a faint suggestionof bimodality,which raises
the possibilityof two kinds of subjects. Might there be some who respond to
calcium,and others who do not? There is no way to tell from these data, but the
possibilityis worth noting.

Centerandspread.A useful plot for comparingcenters, spreadsand symmetriesis


the boxplot(Figure5). Each box locates the quartilesand medianof a distribution;
the "whiskers"extend from the quartile to the most extreme points within 1.5
interquartileranges of the nearest quartile,and points at a greaterdistance from
the median are shownseparately.Here we find a differencein medians,but also a
pronounced difference in spreads, one that should raise suspicions about the
assumptionof equal variancesused to justifya pooled estimate in Example2a.

20 -

° 10-

* 4

ct

.S
ut

U o-
C,
a)

-10

Placebo Calcium

Figure5. Parallelboxplotsof reductionin systolicbloodpressurefor two groupsof men.

Normal quantile plot. Lookingahead to a t-test to comparemeans, it is prudentto


ask whether the data give us reason to questionthe normalmodel of Example2a.
Here we subtractthe groupmean from each observationto get residuals,then plot
the orderedresidualsagainstthe correspondingquantilesof a normaldistribution;
see Figure6. Our ordinatesare the 21 orderedresiduals,which dividethe real line
into 22 sub-intervals.The correspondingabscissasare the 21 values that dividethe
real line into 22 segments that are equiprobableunder the normal model. If the
data come from a single normaldistribution,we can expect the points to fall near a
line.

1997] MATHEMATICS,STATISTICS,AND TEACHING


809
15 -

* F

10

c
J - 11

r * r
* CT
ao

X O-
* * X

_S_

-10 -
* I I I

-1 0 -1
Normalscores

Figure6. Normalquantileplot for the bloodpressuredata.

For the calcium data, the pattern is reasonablylinear, although the vertical
jump before the three right-mostpoints shows observedresidualsthat are larger
than predictedby the normalmodel, a patternconsistentwith the unequalspreads
in the boxplots.
Mathematicallystructuredinstruction,which tends to emphasizehow methods
follow from models, often provides only the most general warnings about the
realities of practice.Statisticsin practiceresembles a dialog between models and
data. Models for the process that producedour data do indeed play a centralrole
in statistical inference. The mathematicalexplorationof properties and conse-
quences of models is therefore important(as it is in economics and physics).But
the data are also allowed to criticize and even falsify proposed models. In the
calcium examples, the exploratoryanalysiswarns us not to rely heavily on the
assumptionof equal variances,and to use a modified t-test that estimatesseparate
variancesfor the two groups.We can modifyBox'sdictuminto a practicalversion
of the statementthat statisticsis not just mathematics:Mathematicaltheoremsare
true, statisticalmethodsare sometimeseffectivewhenused withskill.
Wide availabilityof cheap computing,especially graphics,has combinedwith
the desire to "let the data speak"to generate an abundanceof new tools: at the
low end we have the stemplots and boxplots of Example 2c, but there are also
model-freescatterplotsmoothers,resistantregressionalgorithms,clever ideas for
displayof high-dimensionaldata on two-dimensionalscreens, and many still more
advanced diagnostic tools for specific situations. Standard statistical software
implementsmuch of this. The books [7] and [9], by Bell Labs scientistsinfluenced
by Tukey,present much of the basic graphicalmaterial.The softwarepackagesS
and S-PLUS,which originatedat Bell Labs, implementmore of the new graphics
and also implementseveralnew classes of models. See [8]for detailed discussionof
the latter.

810 MATHEMATICS,STATISTICS,AND TEACHING [November


. CA

Althoughit may be temptingfor the neophyteto view data analysisas merely a


collection of clever tools, the value of these tools comes from using them in a
systematicway, accordingto strategiesthat organizethe examiningof data:
1. Proceed from simple to complex: first examine each variable individually,
then look at relationshipsamongthem.
2. Use a hierarchy of tools: first plot the data, then choose appropriate
numerical descriptions of specific aspects of the data, then if warranted
select a compactmathematicalmodel for the overallpattern of the data.
3. Look at both the overall pattern and at any strikingdeviationsfrom that
pattern.
It is part of the unifying (but non-mathematical)theory of EDA that these
principles apply in each of several settings. Given data on a single quantitative
variable,we might displaythe distributionby a stemplot, note that it reasonably
symmetric,calculate the mean and standarddeviation as numericalsummaries,
and use a normalquantileplot to see whether a normaldistributionis a suitable
compactmodel for the overallpattern.Given two quantitativevariables,we drawa
scatterplot,measurethe directionand strengthof linear associationby the correla-
tion, and, if warranted,use a fitted straightline as a model for the overallpattern.
Thus the univariate"Plot, shvpe,center,spread,"returnsin the contextof bivariate
data as 4'PIot,shape, directaon,st;rength."
Here, as elsewhere,an analysisis not just a searchfor patterns,but a searchfor
meaningfulpatterns.The best fit is not necessarilythe mostusefi>l,as the following
exampleillustrates.

Example3. Dorrnitoraes and cities. Each point in Figure 7 representsone of the 50


U.S. states with horizontalcoordinate equal to the state's urban population,and

160 -

,,, 120-
o
o
O "

.° "

> 80 -
o w |

.t

n 40 - *-
-
*""::
.

| i I I {
o 7.5 15 22.s 30
Urban population (millions)

Figure 7. Scatterplot of dormitory population versus urban population for the 50 U.S. states.

1997] MATHEMATICS,STATISTICS,AND TEACHING


811
vertical coordinateequal to the numberof the state's college students housed in
dormitories.Severalfeaturesof the plot'sshape stand out. For example,the plot is
fan shaped,with manypoints bunchedin the lower left: most states have relatively
small urbanpopulations(a couple of million or so) and relativelysmall dormitory
populations as well (under 50,000); only a few states have very large urban
populationsor very large dormitorypopulations,and the variabilityfrom state to
state is larger (more space between points) for the states with largervalues. The
pattern of associationbetween the two variables is positive and strong: smaller
urbanpopulationsgo with smallerdormitorypopulations,largerurbanpopulations
with largerdormitorypopulationsand, for all but a few of the states, knowingthe
size of a state'surbanpopulationallows us to predict its dormitorypopulationto
within a fairlynarrowrange.
Despite the nice fit between picture and story, the analysis so far has over-
looked a most importantfeature. If we take at face value the pattern that states
with large urban populationsalso have large dormitorypopulations,we might be
tempted to conclude that cities must attractcolleges. Althoughplenty of confirm-
ing instancescome to mind, this naive interpretationis wrong:both our variables
are indirectmeasuresof the size of the states'populations,so it is hardlysurprising
that the two measures show a strong positive association. To uncover a more
meaningfulrelationship,we have to "adjustfor the lurkingvariable:"divide urban
population by total population to get percent urban, divide dormitorypopula-
tion by total population to get percent living in dormitories,and plot the result
(Figure8).

2.5-
*VT
ce
.

g 2.0 -
4 * RI
.=

= * *
.
u 1.5- * u

° *

ct
= * . =

:,,1 O- * * : C

Wo + e

Q wv *. . Q Q

..
; * ;
v . v

AK v
o-

25 50 75 loo
Percentage of population in urban areas

Figure 8. Scatterplot of the dorms-and-cities data after adjusting for the "lurking variable"population.

Now the relationshipis weaker, but what it tells us is more interesting.The


directionis reversed:ruralstates those with a lowerpercentageof their residents
living in metropolitanareas have a higher percentage of their residents living

812 MATHEMATICS,STATISTICS,AND TEACHING [November


in college dormitories.On reflection, this makes sense. Think about Pullman,
Washington,or Ames, Iowa; about Norman, Oklahoma, or Lawrence, Kansas.
Rural states may have fewer colleges and universitiesin absolute numbers,but
their students make up a higher percentage of the total population of the state,
and are more likely to live in dormitories.

2.3. Formalinference:the argumentagainst chance. Statisticalinferenceprovides


methods for drawingconclusionsfrom data about the populationor process from
which the data were drawn. It now becomes essential (as it was not in data
analysis) to distinguish sample statisticsfrom population parameters. The true
values of the parametersare unknownto us. We have the statisticsin hand, but
they would take different values if we repeated out data production.Inference
must take this samplevariabilityinto account.
Probabilitydescribes one kind of variability,the chance variabilityin random
phenomena.When a chance mechanismis explicitlyused to produce data, proba-
bility therefore describesthe variationwe expect to see in repeated samples from
the same population or repeated experiments in the same setting. That is,
probabilityanswersthe question,"Whatwould happenif we did this manytimes?"
Standardstatistical inference is based on probability.It offers conclusionsfrom
data alongwithan indication of how confident we are in the conclusions.The
statement of confidence is based on asking "What would happen if I used this
inferencemethod manytimes?"That is exactlythe kind of questionprobabilitycan
answer,which is why we ask it. The indicationof our confidence in our methods,
expressed in the language of probability,is what distinguishesformal inference
from informalconclusionsbased on, e.g., an exploratoryanalysisof data.
Any particular inference procedure starts with a statistic, perhaps several
statistics,calculatedfrom the sample data. The sampling distribution
is the proba-
bility distributionthat describes how this statistic would vary if we drew many
samplesfrom the same population.In elementarystatisticswe present two types of
inference procedures, confidence intervals and significancetests. A confidence
intervalestimatesan unknownparameter.A significancetest assesses the evidence
that some sought-aftereffect is present in the population.
A confidence intervalconsists of a recipe for estimatingan unknownparameter
from sample data, usually of the form "estimate+ marginof error"and a confi-
dence level, which is the probabilitythat the recipe actuallyproduces an interval
that containsthe true value of the parameter.That is, the confidencelevel answers
the question,"If I used this method manytimes, how often would it give a correct
answer?"
A significance teststartsby supposingthat the sought-aftereffect is not present
in the population.It asks "In that case, is the sample result surprisingor not?"A
probability(the p-value) says how surprisingthe sample result is. A result that
would rarely occur if the effect we seek were absent is good evidence that the
effect is in fact present. Figure 9 illustratesthis reasoningin our medical example.
The normal curves in that figure represent the sampling distribution of the
difference x - y between the mean blood pressure decreases in the calcium and
placebo groups,for the case of no differencebetween the two populationmeans.
This distribution,which shows the variabilitydue to chance alone, has mean 0.
Outcomesgreaterthan 0 come from experimentsin which calciumreduces blood
pressuremore than the placebo. If we observeresult A, we are not surprised;an
outcome this far above 0 would often occur by chance. It provides no credible
evidencethat calciumbeats the placebo. If we observeresult B, on the other hand,

1997] MATHEMATICS,STATISTICS,AND TEACHING


813
o A B o

Figure9. The idea of statisticalsignificance:is this observationsurprising?

the experimenthas producedan effect so strongthat it would almost never occur


simply by chance. We then have strong evidence that the calcium mean does
exceed the placebo mean. The p-value (the right tail probability)is 0.24 for point
A and 0.0005 for point B. These probabilitiesquantifyjust how surprisingan
observationthis large is when there is no effect in the population.What about the
actual data? Point C shows the observedvalue x - y = 5.273. The corresponding
p-value is 0.055. Calciumwould beat the placebo by at least this much in 5.5% of
many experimentsjust by chance variation.The experimentgives some evidence
that calciumis effective, but not extremelystrongevidence.A note for those who
worryabout details:These p-value calculationstook the variabilityof the sample
means to be known. In practice,we must estimate standarddeviationsfrom the
data. The resultingtest has a largerp-value:p = 0.072.

3. TEACHING.In discussingour teaching, we may focus on content, what we


want our studentsto learn, or on pedagogy,what we do to help them learn. These
two topics are of course related. In particular,changes in pedagogy are often
drivenin part by changingprioritiesfor what kinds of thingswe want students to
learn. It is nonetheless convenient to address content and pedagogy separately.
This section, in keeping with the rest of this article, concerns content, and in
particularcontainsone side of a conversationbetween statisticiansand mathemati-
cians who may find themselvesteachingstatistics.
3.1. Statistics should be taught as statistics. Statisticians are convinced that
statistics, while a mathematicalscience, is not a subfield of mathematics.Like
economicsand physics,statisticsmakesheavyand essentialuse of mathematics,yet
has its own territoryto exploreand its own core conceptsto guide the exploration.
Given those convictions,we would naturallyprefer that beginning statistics be
taught as statistics.The American Statistical Association and the MAA have
formed a joint committee to discuss the curriculumin elementarystatistics.The
recommendationsof that group reflect the view that statistics instructionshould
focus on statisticalideas. Here are some excerpts[10];a longer discussionappears
in [11]:

Almost any course in statisticscan be improvedby more emphasis on data


and concepts, at the expense of less theory and fewer recipes. To the
maximumextent feasible, calculationsand graphicsshould be automated.

814 MATHEMATICS,STATISTICS,AND TEACHING [November


Any introductorycourse should take as its main goal helping students to
leam the basics of statisticalthinking.[These include]the need for data, the
importanceof data production,the omnipresenceof variability,the quantifi-
cation and explanationof variability.
The recommendationsof the ASA/MAA committeereflect changesin the field of
statistics over the past generation. Academic statistics, unlike mathematics,is
linked to a largerbody of non-academicprofessionalpractice.Computingtechnol-
ogy has completelychangedthe practiceof statistics.Academicresearchers,driven
in part by the demandsof practiceand in part by the capabilityof new technology,
have changed their taste in research. Bootstrap methods, nonparametricdata
smoothing,regressiondiagnostics,and more generalclasses of models that require
iterativefittingare amongthe recent fruitsof renewedattentionto analysisof data
and scientificinference. Efron and Tibshirani[14] describe some of this work for
non-specialists.
3.2. Neither Mathematics Nor Magic. An over-emphasison probability-based
inference is one markof an overlymathematicalintroductionto statistics,and yet
the reluctance of mathematicallytrained teachers to abandon a theory-driven
presentationof basic statisticshas a respectablebasis:to avoidpresentingstatistics
as magic.It is certainlycommonto teach beginningstatisticsas magic.The user of
statisticsis in manywaysverylike the sorcerer'sapprentice.The incantationhas an
automaticeffectiveness,renderingtheses acceptableand studies publishable.We
are not meant to understandhow the incantationworks that is the domainof the
sorcererhimself.The incantationmust follow the recipe exactly,lest disasterensue
-exploration and flexibility,like understanding,are forbiddento the apprentice.
Fortunately,t le sorcererhas providedsoftwarethat automatesthe exact following
of approvedincantations.
The dangerof staStistics-as-magic is real. But the properdefense is not a retreat
to a mathematicalpresentation that is inadequate to the subject and often
incomprehensibleto students. Mathematacal undersundingas not the only kandof
understandang. It is not even the most helpful kind in most disciplinesthat employ
mathematics,where understandingof the target phenomenaand core concepts of
the discipline take precedence. We should attempt to present an intellectual
frameworkthat makes sense of the collection of tools that statisticiansuse and
encourages their flexible application to solve problems. Students understand
mathematics when they appreciate the power of abstraction, deduction, and
symbolic expression, and can use mathematicaltools and strategies flexibily in
dealing with varied problems. Reasoning from uncertain empirical data is a
similarlypowerful and pervasiveintellectual method. How can we best lead our
studentsto understand,appreciate,andbeginto assimilatethis intellectualmethod?
3.3. Begin with exploratoly data analysis. Although the implied chronologyof
Figure2 suggestsstartingwith data production,experiencesays otherwise.For one
thing, exploratorydata analysis makes a better beginning because it is more
concrete. There is no need to distinguishpopulationand sample, and no need to
discussthe features of randomizationthat prote&tagainstbias. Basic methods are
conceptuallyand algorithmicallysimple, and the data are in hand-actual num-
bers on a page, as opposed to mere ghosts of data-in-the-futurethe way they are
in designing an experiment. Moreover7providingmotivation is not a problem.
Studentslike exploratoryanalysisand find that they can do it, a substantialbonus
when teachinga subjectfeared by many.Engagingthem earlyon in the interpreta-
tion of results, before the harder ideas come along to claim their attention, can

lg97] MATHEMATICSsSTATISTICSlD TEACHING


815
help establishgood habits that pay dividendswhen you get to inference. Finally?
startingwith data analysispreparesfor design and for inference. Experiencewith
data distributionsintroducesstudents to the omnipresenceof variability,and to
the potential for bias, the two main reasons we need careful design. If you teach
design before data analysis, it is harder for students to understandwhy design
matters. Experiencewith data distributionsis also the best way to get ready to
tackle the difficultidea of a samplingdistribution.
We have tried to suggestthat there is a coherent(thoughnot mathematical)set
of ideas and associated tools for exploringdata. Students need to practice these
ideas and tools by writingcoherent descriptionsof data. To help them, we provide
both outlines for what to writeSand examplesthat can serve as models. Figure 10,
for example7is the outline for describinga single quantitativevariable.
A. Describethe data
numberof observations
natureof the variable
how it was measured
unitsof measurement
B. Plot the data,choose from
dotplot
stemplot
histogram
C. Describethe overallpattern
shape
no clearshape?
skewor symmetric?
singleor multiplepeaks7
centerand spread;choose from
five-numbersummary
meanand standarddeviation
is normalityan adequatemodel(normalquantileplot)9
D. Lookfor strikingdeviationsfromthe overallpattern
outliers
gapsor clusters
E. Interpretyourfindingsin C and D in the languageof the problemsetting.Suggestplausible
explanationsbr yourfindings.
Figure10. Outlinefor describingdata on a singlequantitativevariable.
Following this outline requires both knowledge of the tools it mentions and
judgmentto choose amongthem and interpretthe results.Judgmentis formed by
experiencewith data. Students cannot at first 4'read"graphs any more than they
can read words or equations. Here is an example of a basic one-variabledata
analysis. Describing relations among several variables requires more elaborate
tools and finer judgment.
In a study of resistanceto infection [2], researchersinjected 72 guinea pigs
with tuberclebacilli and measuredtheir survivaltime in days after infection.
Both a histogram(Figure 11) and a normal quantile plot (Figure 12) show
that the distributionof survivaltimes is stronglyskewed to the right. There
are no outliers- although some individuals survived far longer than the
average,this appearsto be a characteristicof the overall distributionrather
than pointingto, for example,errorsin measuringor recordingthese individ-
uals.

816 MATHEMATICS,STATISTICS,AND TEACHING [November


30 -

25 -

20 -

= 15 -

10 -

l___ lull * ll * - - - - |

l w g

O-
0 100 200 300 400 500 600
Survivaltime,days

Figure11. Histogramof guineapig survivaltimes.

2
CT
d. -

.-

Ct

i -1

-2

-3
100 200 300 400 500 600
Survivaltime (days)

Figure12. Normalquantileplot for guineapig survivaltimes.

The strongskewnesssuggeststhat the five numbersummary(min = 43 days,


first quartile= 82.5 days, median = 102.5 days, third quartile= 151.5 days,
max = 598 days)is a better numericalsummarythan the mean and standard
deviation (x= 141.8 days, s = 109.2 days). There is very large variationin
survival times among the individuals for example, the third quartile is
almost 150% of the median and the largest 6 observationsare more than
double the median.Withoutmore information,we cannot accuratelypredict
the survivaltime of an infected individual.Moreover,standardt procedures
should not be used for inference about survivaltime. Inferencecould employ
a non-normaldistributionas a model or seek a transformationto a scale that
is more nearlynormal.
Althoughmany students come to a first statisticscourse expectingempty
ritual,EDA offers them the pleasantsurprisethat the methodsexist to serve

1997] MATHEMATICS,STATISTICS,AND TEACHING

817
the searchfor meaning.This surpriseis so welcome that it carriesa dangerof
pushingthe pendulumtoo far the other way. Some studentsmay drift into a
complacentconvictionthat any story about the data that fits the patterns
with coherenceand plausibilitymustbe true. The timingis rightfor a dose of
design and skepticism.
3.4. Teach design as the bridgebetweendata analysis and inference. An introduc-
tion to design for data productionfits naturallybetween exploratoryanalysisand
inference: sound design is what makes inference possible. Waiting to introduce
probabilitydistributionsuntil after the basics of design has a number of advan-
tages. For one thing, this order helps make clear that the justification for
probability models must come from the randomness in the data production
process, and so providessome protectionagainstunthinkingadoptionof probabil-
ity models. For another, learning about data productionintroduces students to
essential conceptslike populationand sample,parameterand statistic,before they
encounterthe samplingdistribution,which is conceptuallydifficultall by itself.
The single most importantpoint for studentsto understandis why randomized
comparativeexperimentsare the gold standardfor evidence of causation.A rich
source of true-lifecautionarytales is the book [6], edited by the physiciansBunker
and Barnes and the statistician Mosteller, which contains striking examples of
medical treatments that became standardin the days before medicine adopted
randomized comparative experiments, and were found to be worthless when
subjectedto propertesting.
There is of course more to the statistical side of designing experimentsand
sample surveysthan "randomize."The designs used in practice are often quite
complex, and must balance efficiency with the need for informationof vaxying
precision about many factors and their interactions.Simple designs randomized
experimentscomparingtwo or several treatments,simple random samples from
one or several populations-illustrate the most importantideas and support the
inference taughtin a first statisticscourse.You must talk about these designs,but
need not go farther.Some other importantmaterial,for example,proceduresfor
developingand testing surveyquestionsand for trainingand supervisinginterview-
ers, is not usually presented in statistics courses. Statistics students should be
aware that these practicalskills do matter, and that data productioncan go awry
even when we startwith a sound statisticaldesign.How muchtime to spend here is
a matter of your judgmentof the needs of your audience.
3.5. Inference:two barriers to understanding.Section 2.3 has described briefly
how inferenceworks.Because the details are in practiceautomated,we would like
students to put most of their effort into graspingthe ideas. They are not easy to
grasp.The first barrieris the notion of a samplingdistribution.Choose a simple
setting, such as using the proportionp of a sample of workerswho are unem-
ployed to estimate the proportionp of unemployedworkersin an entire popula-
tion. Physicalexamples (samplingbeads from a box), computersimulations,and
encouragingthought experimentsall help convey the idea of many samples with
many values of p. Keep asking, "What would happen if I did this many times?"
That questionis the key to the logic of standardstatisticalinference.
Once the idea of a sampling distributionbegins to settle, the tools of data
analysishelp us take the next steps. Faced with any distribution,we ask about
shape, center, and spread.The shape of the samplingdistributionof p is approxi-
matelynormal.The mean is equal to the unknownpopulationproportionp. This
says that p as an estimatorof p has no bias, or systematicerror.The precisionof

MATHEMATICS,STATISTICS,AND TEACHING [November


818
the estimatoris describedby the spreadof the samplingdistribution,which(thanks
to normality)we measureby its standarddeviation.We are now only details away
from confidenceintervals.
The second major barrieris the reasoningof significancetests. Although the
basic idea ("Is this outcome surprising?") is not recondite,the details are daunting.
There'sno escape from null and alternativehypothesesand one- versus two-sided
tests. The logic of testing,which startsout "Supposefor the sake of argumentthat
the effect we seek is not present. . . " isn't straightforward.We'd like most of our
students to understandthe idea of a samplingdistribution;we know that quite a
few won'tunderstandthe reasoningof significancetests. Our fallbackpositionis to
insist that they be able to verbalizethe meaningof p-values producedby software
or reported in a journal. This is part of insisting that students write succinct
summariesof statisticalfindings. "The study comparedtwo methods of teaching
readingto third-gradestudents.A two-samplet test comparingthe mean scores of
the two treatmentgroupson a standardreadingtest had p-value p = 0.019. That
is, the study observed an effect so large that it would occur just by chance only
about 2% of the time. This is quite strong evidence that the new method does
result in a higher mean score than the standardmethod."
Two concludingremarksabout inference.First, a conceptualgraspof the ideas
is almost pictorial,based on picturingthe samplingdistributionand followingthe
tacticslearned in data analysis.No amountof formalmathematicscan replace this
pictorialvision, and no amount of mathematicalderivationwill help most of our
studentssee the vision. The mathematicsis essential to our knowingthe facts, but
this does not implythat we should impose the mathematicson our students.
Second, we want our students to know a good deal more than the big picture
and several recipes that implementit in specific settings. Here are some further
points, both practical and conceptual, roughlyin order of importance.How far
down the list you should go depends on your audience.

* Studyof specificinferenceproceduresrevealsbehaviorsthat are commonand


that all studentsshould understand.To get higher confidencefrom the same
data, you must pay with a largermarginof error.Even effects so small as to
be practicallyunimportantare highly significantin the statisticalsense if we
base a significancetest on a very large sample.
* Lots of thingscan go wrongthat make inferenceof dubiousvalue. Comparing
subjectswho choose to take calciumagainstotherswho don'ttells little about
the effects of calcium,because those who choose to take calciummay be very
health-consciousin general. One extremeoutlier could pull the conclusionof
our medical experimentin either direction,again invalidatingthe inference.
Examine the data production.Plot the data. Then, perhaps, go on to infer-
ence.
* Inferenceproceduresthemselvesdon'ttell us that somethingwent wrong.The
margin of error in a confidence interval, for example, includes only the
chance variationin randomsampling.As the New YorkTimessays in the box
that accompaniesits opinion poll results, "In additionto samplingerror,the
practicaldifficultiesof conductingany surveyof publicopinion may introduce
other sources of errorinto the poll."
* Commoninference proceduresreally are based on mathematicalmodels like
the one that appearsin our medical example:X1, X2, . . ., Xn iid N( y1, 1),
Y1,Y2,. . ., Ymiid N( 2, C2) This model isn'texactlytrue;is it useful? In fact,
the two-samplet proceduresthat follow from this model when we want to

1997] MATHEMATICS,STATISTICS,AND TEACHING


819
compare,u1and 2 are quite robustagainstnon-normality,so the model does
lead to practicallyuseful procedures.But the variance ratio F statistic for
comparing1 and CT2 iS extremelysensitiveto non-normality,so muchso that
it is of little practicalvalue. Even beginnersneed to be aware of such issues.
* We often want to do inference when our data do not come from a random
sample or randomizedcomparativeexperiment.Think, for example,of mea-
surements on successive parts flowing from an assembly line. Inference is
justified by a probabilitymodel for the process that producedour data, and
the correctnessof the model can to some extent be assessed from the data
themselves.Randomizeddata productionis the paradigmand the most secure
setting for inference,but it is not the only allowablesetting.
* Inductiveinferencefrom data is conceptuallycomplex.It's not surprisingthat
there are alternativeways of thinking about it. Standard statistical theory
tends to think of inference as if its purpose were to make decisions. A test
must decide between the null and alternativehypotheses,for example. This
leads at once to Type I and Type II errors and so on. The decision-making
approachfits uneasilywith the "Is this outcome surprising?"logic expressed
by p-values. We think that assessingthe strengthof evidence is a much more
commongoal than makinga decision,but not everyoneagrees.The Bayesian
school of thought goes farther,by introducingan explicit descriptionof the
available prior informationinto any statistical setting and combiningprior
informationwith data to reach a decision.Almost all statisticiansthink this is
sometimesa good idea. Bayesiansthink all statisticalproblemscan be made
to fit this paradigm.This is a (stronglyheld) minorityposition. Deep water
ahead.
3.6. What AboutProbability?Probabilityis an essential part of any mathematical
education. It is an elegant and powerful field of mathematicsthat enriches the
subjectas a whole by its interactionswith other fields of mathematics.Probability
is also essential to serious studyof applied mathematicsand mathematicalmodel-
ing. The domain of determinismin natural and social phenomena is limited, so
that the mathematicaldescriptionof randombehaviormust play a large role in
describingthe world. Whetherour mathematicaltastes run to purityor modeling,
probabilityhelps to satisfy them. Here, however,we are discussingintroductory
statisticsratherthan mathematics.
From the point of view of deductivelogic that has shaped so much of statistical
teaching in the past, probabilityis more basic than statistics:probabilityprovides
the chance models that describethe variabilityin observeddata. Fromthe point of
view of the developmentof understanding,however,we believe that statistics is
more basic than probability:whereasvariabilityin data can be perceiveddirectly,
chance models can be perceivedonly after we have constructedthem in our own
minds.In the ideal Platonicworld of mathematics,we can startwith a probabilistic
chickenand use deductivelogic to lay a statisticalegg, but in the messierworld of
empiricalscience, we must start with the egg as observed data and construct a
priorprobabilisticchickenas an inference. In an introductorystatistics course, the
chicken'sonly value is to explainwhere eggs come from. It seems a bit unfair, in
that context, at least, to ask beginning students to learn about egg-generators
before they'vebecome familiarwith eggs less extreme,but in the same spirit as
startingthe studyof chemistrywith quantummechanics.
What then, should be the place of probabilityin beginning instruction in
statistics?Our positionis not standard,thoughit is gainingadherents:first courses
in statisticsshould contain essentiallyno formalprobabilitytheory.

820 MATHEMATICS,.STATISTICS,AND TEACHING [November


Why? First, because informalprobability for a conceptual
is sufficient graspof
inference.Although the theoretical structure of standard statistical inference is
based on probability,the role of probabilityis limited to answeringthe question
"What would happen if we used this method vety many times?"The answer is
given by the sampling distributionof a statistic, which records the pattern of
variationof the outcomes of, for example, many randomsamples from the same
population.If we agree that actuallyderivingthese distributionsis better left to
more advancedstudy, they can be understoodas distributionsusing the tools of
data analysis,withoutthe apparatusof formalprobability.Rules for P(A U B) add
vety little to a statisticscourse.
The second reason to avoid formalprobabilityis that probability is conceptually
thehardest subjectin elementary mathematics.
The historyof probabilisticideas (see
[16]and [27])is fascinatingbut a bit frightening.Better mindsthan ours long found
the subjectconfusingin the extreme.Psychologists? beginningwith Tverskyand his
collaborators,have demonstratedthat confusion persists, even among those who
can recite the axiomsof formalprobabilityand who can do textbookexercises.Our
intuitionof randombehavioris gravelyand systematicallydefective;see, e.g., [28]
and the collection [19]. What is worse, mathematicseducators have found no
effectiveway to correctour defectiveintuition.Garfieldand Ahlgren[15]conclude
a reviewof researchby statingthat "teachinga conceptualgraspof probabilitystill
appears to be a very difficult task, fraught with ambiguityand illusion."They
suggest study of "how useful ideas of statisticalinference can be taught indepen-
dentlyof technicallycorrectprobability."Webelieve that concentratingon the idea
of a sampling distribution allows this, at least at the depth appropriate for
beginners.
The concepts of statisticalinference,startingwith samplingdistributions,are of
course also quite tough. We ought to concentrateour attention,and ours students'
limited patience with hard ideas, on the essential ideas of statistics. We faculty
imagine that formal probabilityillumines those ideas. That's simply not true for
almost all of our students.

3.7. VVhatAbout Mathematics Majors? Mathematicsmajors traditionallymeet


statisticsas the second course in a year-longsequence devoted to probabilityand
statistical theory. We hope it is clear that we don't regard a tour of sufficient
statistics,unbiasedness,maximumlikelihoodestimators,and the Neyman-Pearson
theorem as a promising way to help students understand the core ideas of
statistics.On the other hand, mathematicsmajorsshould certainlysee some of the
mathematicalstructureof statisticalinference.What ought we do?
Our preference is to precede the study of theoty by a thoroughdata-oriented
introduction to statistical ideas and methods and their applications. That is,
mathematicsstudents are not necessarilyan exceptionto the principlethat a first
introductionto statisticsshouldnot be based on formalprobability.If the students
have strong quantitativebackgrounds,a data-orientedcourse can move quickly
enoughto present genuinelyuseful statisticsand seriousapplications.The need for
theorycan be made clear as we face issues of practice,and the theorymakesmuch
more sense when its setting in practice is clear. In many institutions,however,
constraintsor faculty hesitation make this path difficult. In others, there is little
coordinationbetween the "applied"and theoreticalcourses,so that the latter does
not in fact build on the former.
We ought thereforeto reconsiderwhat a one-semesterintroductionto statistics
for mathematicsmajorsand other quantitativelystrong students should look like.

1997] MATHEMATICS,STATISTICS,AND TEACHING


821
This course would ordinarilyand most easily follow a course in probability.Here
we encounteranotherbarrier:we can'tin good conscienceretool both semestersof
the standardprobability-statistics sequence to cptimize the introductionto statis-
tics. Probabilityis importantin its own rightSnot just as preparationfor statistical
theory.The more emphasisa departmentplaces on applicationsand modeling in
its majorcurriculum,the more the probabilibrcourse must play an essential role in
this emphasis. An introduction to probability that emphasizes modeling and
includessimulationand numericalcalculationcertainlysets the stage for statistics?
but we are hesitant to move any strictly statistical ideas into the probability
semester.The reformof probabilityand the reformof statisticsare distinctissues.
Our goal should be an integrated statistics course that moves through data
analysis data production, and inference in turn, emphasizing the organizing
principles of each. We should certainly take advantage of and strengthen the
student's mathematicalcapacities. Although data analysis and data production
have no unifying theory, mathematicalanalysis can illumine even data analysis.
Here are a few examples.
* A. Consider the optimalityproperties of measures of center for n observa-
tions. The mean minimizesthe mean squarederror;the medianminimizes
the mean absolute error (and need not be unique) the midrangemini-
mizes the maximumabsolute(or squared)errorntty minimizingthe median
absolute error for n = 3 and examine the unpleasant behavior of the
resultingmeasure.
* B. Students met the Chebychevinequalitywhile studyingprobability.Now
they may meet the interestinginequality 1, - ml < cr linking the mean,
median, and standard deviation of any distribution[29]. Describe one-
sample data by the empirical distribution(probability1/n on each ob-
served point) to draw conclusionsabout how far apart the sample mean
and mediainmay be.
* C. The least-squaresregressionline is the analog of the mean x for predict-
ing y from x. Derive it. Then explore,perhapsusing software,analogsof
the other measuresmentionedin A.
Data productionlends itself to probabilitycalculationsthat illustratehow likely
it is that randomassignmentswill be unbalancedin specific ways;the advantages
of large samplessoon become clear.
Veiy nice. We can give our students a balanced introductionto statisticsthat
makes use of their knowledgeof mathematics.The inevitableconsequenceis that
we spend less time on inference.We must decide what to preserveandwhat to cut.
There is as yet no consensusSbecauseSdespite much grumbling,the reformof the
math majorsequence has not yet begun. Imaginingsuch a reformis a good place
to end a discussionof statistics,mathematics and teaching.This is your take-home
exam:design a better one-semesterstatisticscourse for mathematicsmajors.
REFERENCES

1. Aldous David (l994), Triangulatingthe circle, at random,Amer. Math. A6onthly101, 223-233.


The remarkappearsin the biographicalnote accompanyingthe paper.
2. Bjerkedal,T. (1960), Acquisitionof resistance in guinea pigs infected with different doses of
virulenttuberclebacilli,AmericanJourzaalQf Hygiene72, 130-148.
3. Boyer,Paul and StephenNissenbaum(1972). Salem VillageWitcheraft. Belmont,CA. Wadsworth
PublishingCo.
4. Boyer,Paul and StephenNissenbaum(1974). SalemPossessed.Cambridge,MA: fIarvardUniver-
sity Press.

822 MATHEMATICS STATISTICSsD TEACHING NovemDer


5. Bullock,James O. (1994), Literacyin the languageof mathematics,Amer. Math. Monthly 101,
735-743.
6. Bunker,John P., BenjaminA. Barnes, and FrederickMosteller(eds.) (1977), Costs, Risks and
Benefits of Surgezy.New York:OxfordUniversityPress.
7. Chambers,John M., WilliamS. Cleveland,Beat Kleiner, and Paul A. Tukey (1983), Graphical
Methodsfor Data Analysis. Belmont,CA: Wadsworth.
8. Chambers,John M. and Trevor J. Hastie (1992), Statistical Model in S. Pacific Grove, CA:
Wadsworth.
9. Cleveland,WilliamS. and MaryE. McGill(eds.) (1988),I)ynamic Graphicsfor Statistics. Belmont,
CA: Wadsworth.
10. Cobb, George W. (1991),Teachingstatistics:more data, less lecturing,Amstat News, December
l991,pp. 1,4.
11. Cobb, George W. (1992), Teachingstatistics,in L. A. Steen (ed.) Heeding the Call for Change:
Suggestionsfor CurricularAction, MAA Notes 22. Washington,DC: MathematicalAssociationof
America.
12. Cochran,W. G. (1968).The effectivenessof adjustmentby subclassificationin removingbias in
observationalstudies, Biometrics 24, 205-213.
13. Crystal,David (ed.) (1994), The CambridgeFactfinder. Cambridge:CambridgeUniversityPress,
pp. 174-175.
14. Efron, Bradleyand Rob Tibshirani(1991), Statisticaldata analysisin the computerage, Science
253, 390-395.
15. Garfield,Joan and AndrewAhlgren(1988),Difficultiesin learningbasic conceptsin probability
and statistics:implicationsfor research,Joumal for Research in MathematicsEducation 19, 44-63.
16. Gigerenzer,G., Z. Swijtink,T. Porter,L. Daston, J. Beatty, and L. Kruger(1989) The Empire of
Chance. Cambridge:CambridgeUniversityPress.
17. Hoaglin, D. C. (1992), Diagnostics,in D. C. Hoaglin and D. S. Moore (eds.), Perspectives on
ContemporazyStatistics, MAA Notes 21. Washington,DC: MathematicalAssociationof America,
pp. 123-144.
18. Hoaglin,David C. and DavidS. Moore(eds.) (1992), Perspectiveson ContemporazyStatistics, MAA
Notes 21. Washington,DC: MathematicalAssociationof America.
19. Kapadia, R. and M. Borovcnik(eds.) (1991), Chance Encounters: Probability in Education.
Dordrecht:Kluwer.
20. Longfellow,HenryWadsworth(1847), Evangeline, Introduction,1.1.
21. Lyle, RoseannM. et al. (1987),Blood pressureand metaboliceffects of calciumsupplementation
in normotensivewhite andblackmen, Jourrzalof theAmerican MedicalAssociation 257, 1772-1776.
Dr. Lyle providedthe data in the example.
22. Moore, David S. (1988), Should mathematiciansteach statistics(with discussion),College Math.
Journal 19, 3-7.
23. Moore, David S. (1992), What is statistics?in David C. Hoaglin and David S. Moore (eds.),
Perspectiveson ContemporazyStatistics, MAA Notes 21. Washington,DC: MathematicalAssocia-
tion of America,pp. 1-18.
24. Moore, David S. (1992), Teachingstatisticsas a respectablesubject,in Florence Gordon and
Sheldon Gordon(eds.), Statisticsfor the Twenty-FirstCentury, MAA Notes 26. Washington,DC:
MathematicalAssociationof America.
25. Moore,David S. (1995),The Basic Practice of Statistics. New York:WXH. Freeman.
26. Rosenbaum,Paul R. (1995),ObservationalStudies. New York:Springer-Verlag, p. 60.
27. Stigler, S. M. (1986), The Histozy of Statistics: The Measurement of Uncertainty Before 1900.
Cambridge,Mass:Belknap.
28. Tversky,Amos and Daniel Kahneman(1983),Extensionalversusintuitivereasoning:The conjunc-
tion fallacyin probabilityjudgment,Psychological Review 90, 293-315.
29. Watson,G. S. (1994),letter to the editor, The AmeracanStatisticia7a48, p. 269. This is the last in a
sequenceof commentson this inequality,and containsreferencesto the earliercontributions.
30. Weisberg,Sanford(1985). Applied Linear Regressaon,2nd edition. New York: John Wiley and
Sons, p. 230.

Department
of Mathematics,Statistics Departmentof Statistics
and ComputerScience PurdueUniversity
MountHolyokeCollege WestLafayette,IN 47907
SouthHadZey,
M4 01075 [email protected]
[email protected]

1997] MATHEMATICS,STATISTICS,AND TEACHING

823

You might also like