The Elements Statistic: Appendix A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Appendix A A singlE

letter of
More oft
The Elements of Statistics thenuml
the comn
ters are ;
you will n
reason for
vectors at
A famili

A.1. IN TRODUCTION to. You m~

(o') for POI
This appendix is intended to remind you of some of the basic ideas, con· distributio:
cepts, and results of classical statistics, which you may have forgotten. If using any
you have never encountered any of these ideas before, this is not the place symbols is
to start-you really need to read one of the hundreds of introductory values or t<
statistics texts or take a class in introductory statistics. If you have sets of valUo
taken any introductory statistics class, what follows should be reasonably height valu
familiar. Most of the information in this appendix is useful background for same thing.
the main text, although you can probably survive without a detailed, in· operations c
depth knowledge of it all. A good geographical introduction to many of Two symb
these ideas, which also introduces some of the more spatial issues that we appear in a
focus on in this book, is Peter Rogerson's Statistical Methods for
Geography (2001).
We may introduce different terminology and symbols from those you Table A.1
have encountered elsewhere, so you should get used to those used here,
as they appear in the main text. Indeed, the presentation of this book may
be more mathematical in places than you are used to, so we start with X T
some notes on mathematical notation. This is not intended to put you off, y T
and it really shouldn't. Many of the concepts of spatial analysis are difli· z, a, b Tl
cult to express concisely without mathematical notation. Therefore, you
will get much further if you put a little effort into coming to grips with the n, m Tl
notation. The effort will also make it easier to understand the spab:tl h Ei
analysis literature, since journal articles and most textbooks simply l
d Di1
assume that you know these things. They also tend to use slightly difft•r·
w Th
ent symbols each time, so it's better if you have an idea of the principb
s An
behind the notation.

The Elements of Slattsllcs 385

Pr e liminary Notes on Not atio n

A single instance of some variable or quantity is usually denoted by a
lowercase italicized letter symbol. Sometimes the symbol might be the initial
letter of the quantity we're talking about, say h for height or d for distance.
More often, in introducing a statistical measure, we don't really care what
the numbers represent, because they could be anything, so we just use one of
the commonly used mathematical symbols, say x or y. Commonly used let-
ters are x, y, z, n, m, and k. In the main text, these occur frequently and
generally have the meanings described in Table A.l. In addition to these six,
you will note that d, w, and s also occur frequently in spatial analysis. The
r eason for the use of boldface type for s is made clear in Appendi.x B, where
vectors and matrices are discussed.
A famil iar aspect of mathematical notation is that letters from the Greek
alphabet are used alongside the Roman alphabet letters that you are used
to. You may already be familiar with mu (Jd for a population mean, sigma
<a> for population standard deviation, chi <x> for a particular statistical
· ideas, con· distribution, and pi (rr) for ... well, just for pi. In general, we try to avoid
forgotten. If using any more Greek symbols than these. The reason for introducing
ot the place symbols is so that we can use mathematical notation to talk about related
ntroductory values or to indicate mathematical operations that we want to perform on
f you have sets of values. So if h (or zl represents our height values, /z 2 (or z 2 ) indicates
l reasonably height value squared. The symbols are a very concise way of saying the
kg'l'ound fot· same thing, and that's very important when we describe more complex
:letailed, in· operations on data sets.
to many of Two symbols that you will see a lot are i and}. However, i and} normally
ues that we appear in a particular way. To describe complex operations on sets of
1ethods for

1 those you Table A.1 Commonly Used Symbols and Their Meaning in This Book
· used here,
S.vmbol Meaning
is book may
~ start with x The "casting" geographical coordinate or a general data value
put you off, y The "northing'' geographical coordinate or a general data value
;is arc di ffi· z. a, b The numerical value of some measurement recorded at the
~refore. you geographical coordinates lx, y)
ips with the 11, m The number of observations in a data set
the spatial k Either an arbitrary constant or, sometimes, the number of entities
in a spatial neighborhood
10ks simply
d Distance
ghtly differ· w The strength or weight of interaction between locations
e principle~ s An arbitrary <x. y> location

values, we need another notational device: subscripts. Subscripts are small which is usua
italic letters or numbers below and to the right of normal mathematical
symbols: The i in z; is a subscript. A subscript is used to signify that
there may be more than one item of the type denoted by the symbol, so z;
stands in for a series or set of z values: z 1, z 2 , z 3 , and so on. This has various
uses: or to

• A set of values is written between braces, so that lzt. z2, ... , Z 11 1> z,)
tells us that there are n elements in this set of z values. If required,
the set as a whole may be denoted by a capital letter: Z. A typical
value from the set Z is denoted z, and we can abbreviate the previous where the nu:
partial listing of z's to simply Z = (z,), where it is understood that the of the simple
set has n elements.
• In spatial analysis, it is common for the subscripts to refer to loca-
tions at which observations have been made and for the same sub-
scripts to be used across a number of different data sets. Thus, 1!1 and
t 1 refer to the values of two different observations-say, height and instead. Or p
temperature-at the same location (i.e., location number 7). of the produc1
• Subscripts may also be used to distinguish different calculations of
(say) the same statistic on different populations or samples. Thus, llA
and flB would denote the means of two different data sets, A and B.

The symbols i and j usually appear as subscripts in one or other of these In spatial an
ways. A particularly common usage is to denote summation operations, two sets of v
which are indicated by use of the L symbol (another Greek letter, this example,
time capital sigma). This is where subscripts come into their own, because
we can specify a range of values that are summed to produce a result. Thus,
the sum

indicates tha
is denoted in turn (the a
value in turr
setting i to 1
(A.2) We then set1
to L:, (Z11 -z1
vidual sums
indicating that summation of a set of a values should be carried out on all
complex at fi
the elements from a 1 to a 6 . For a set of n "a" values this becomes
In the nex
make it easy
(A3) set. Other el
explained in
The Elements of Statistics 387

which is usua lly abbr evia ted to eith er


1 I
(A.4 )

or to

.• , Zn I• Zn} (A.5)
If requ ired ,
Z. A typic al
the prev ious whe re the num ber ofva lues in the set of'a
's is unde rstoo d to ben . Ifin stea d
that the of the simp le sum we wan ted the sum of the
squa res of the a valu es we have


inste ad. Or perh aps we have two data sets.

A and B, and we wan t the sum
of the prod ucts of the a and b valu es at each
loca tion. This wou ld be deno ted

L:a ,bi (A.7)

' 1

In spat ial anal ysis , mor e com plex oper ation

oper ation s, s mig ht be carr ied out betw een
two sets of valu es, and we may then need
lette r, this two sum mati on oper ator s. For
exam ple,
own, beca use
a resu lt. Thu s,

(A.l )
indi cate s that cis to be calc ulate d in two
stag es. Firs t, we take each z valu e
in turn (the oute r i subs crip t) and sum the
squa re of its valu e min us ever y z
valu e in turn <the j subs cript ). You can
figu re this out by imag inin g first
settm g i to 1 and calc ulati ng the inne r sum
, whic h wou ld be 1:1 (z 1 - z1 )?. .
IA.2 ) We then set i to 2 and do the sum mati on
I:1 (z2 - z1 ) 2 , and so on, all the way
to 1: (z, - z )2 • The final doub le sum mati
1 1 on is the sum of all of thes e indi-
vidu al sum s, and c is equa l to this sum mult iplie
d by k. This will seem
com plex at first , but you will get used to
In the next secti on you will see imm edia
tely how thes e nota tion al tools
mak e it easy to writ e dow n oper ation s like
find ing the mea n valu e of a data
set. Oth er elem ents of nota tion will be intro
(A.3J duce d as they are requ ired and
expl aine d in the appe ndic es and main text.

A.2. DESCRIB I NG DATA and the samp

The most fundamental operation in statistics is describing data. The mea-
sures described below are commonly used to describe the overall character-
istics of a set of data.

In all these ex
Population Parameters an estimate o
These arc presented without comment. A populatton mean fl is given by symbols for n
1 Sample sta
Jl = La, I

cA.9) ing populatio1

11 I
cerned with c
population variance a2 is given by
en - 11 in th
reflects the l<
= -1~
L (a, -
1~ )~ (A.lO) because we ~
n , 1 a1 .. ·On I• th
Cn- 1) denon
and population standard deviation a is given by sponding pop
-- Together, t

a= If;L r

(a, Ill
nient summa
tendency and
the data set.
These statistics are referred to as population parameters. indicative of ~

Sample Statistics
The statistics above are based on the entire population of interest, which The z-score o
may not be known. Most descriptive statistics are calculated for a sample of
the entire population. They therefore have two purposes: First, they are
summary descriptions of the sample data, and second. they serve as esti·
mates of the corresponding population parameters. The sample mean is and relati\'e

... 1
a- =Jl =
La, 1


the sample variance s 2 is The z-score i1

the mean, s1
equivalent t(
s 2 =a·2 - 1 ~ (a,
L :.!
_ a)' (A.l3) and z < 0 is lc
11 1 1
ing confidem
The Elements of Statistics 389

and the sample standard deviation s is given by

s =a= - - (a, - a> (A.14)
n - 1 i-1

In all these expressions the " ("hat") symbol indicates that the expression is
an estimate of the corresponding population parameter. Note the different
symbols for these sample statistics relative to the corresponding population
is given by
Sample statistics may be used as unbiased estimators of the correspond-
(A.9) mg population parameters, and a major part of inferential statistics is con-
cerned with determining how good an estimate a sample statistic is of the
corresponding population parameter. Note particularly the denominator
(n - 1) in the sample variance and standard deviation statistics. This
reflects the loss of one degree of freedom !dfl in the sample statistic case
because we know that L: <a;- a>= o, so that given the values of a and
a 1 • ·a,. t. the value of a 11 is known. The expressions shown, using the
In - ll denominator, are known to produce better estimates of the corre-
sponding population parameters than those obtained using n.
Together, the mean and variance or standard deviation provide a conve-
nient summary description of a data set. The mean is a measure of central
(A.ll) tendency and is a typical value somewhere in the middle of all the values in
the data set. The variance and standard deviation are measures of sp1·ead,
indicative of how dispersed are the values in the data set.

The z-score of a value a, relative to its population is given by
terest, which
a sample of a ; -llA (A.15l
they are
Z; = (1

and relative to a sample by

a; -a (A.16)
(A.12) S

The z-score indicates the place of a particular value in a data set relative to
the mean, standardized with respect to the standard deviation. z = 0 is
equivalent to the sample mean, z > 0 is a value greater than the mean
!A.l3> and z < 0 is less than the mean. The z-score is used extensively in determin-
ing confidence intervals and in assessing statistical significance.

median often g
Median, Percentile s, Quartiles , and Box Plots
a data set.
Other descriptive statistics are based on sorting the values in a data set into Two other p
numerical order and describing them according to their position in the upper quartile
ordered list. The first percentile in a data set is the value below which respectively. I
1'1, and above which 99'1 of the data values are found. Other percentiles interquartile r
are defined similarly, and certain percentiles are frequently used as sum-
mary statistics. The 50th percentile is the median, sometimes denoted M.
Half the values in a data set are below the median and half are above. Like
the mean, the median is a measure of central tendency. Comparison of the The interqu.
mean and median may indicate whether or not a data set is skewed. If indicative oft
a > MA, this indicates that high values in the data set are pulling the data spread, i
mean above the median; such data are right shewed. Conversely, if such as the ra
a < MA, a few low values may be 'pulling' the mean below the median the variance li
and the data are left sllewed.
A useful gr~:
Skewed data sets are common in human geography. A good example is plot. A numbE
often provided by ethnicity data in administrativ e districts. For example. state the wa~
Figure A.1 is a histogram for the African-Amer ican percentage of popula- diagram on th
tion in the 67 Florida counties as estimated for 1999. The strong right skew the same Flor
in these data is illustrated by the histogram, with almost half of all the itself is drawl
counties having African-Amer ican populations of 10<"c or less. The right horizontal lin
skew is confirmed by the mean and median of these data. The median wluskers on t
percent African-Amer ican is 11.65%, whereas the mean value is higher at one-and-a-hal
14.17% and the small numbers of counties with higher percentages of limits, that i:
African-Amer icans pull the mean value up relative to the median. The are regarded
either the mil
the fences at
Florida by c ounty imum value E
Percent African-American
30 n =67

25 -
c: 50
c::::> Q) 40
I-- ;E 20

0 10 20

30 <40 50 60
0 10

Percent African-American

Figure A.1 Example of right skewed data in human geography.

The Elements of Stat1stics 391

ox Plots median often gives a better indication of what constitutes a typical value in
a data set.
a data set into Two other percentiles are frequently reported. These are the lower and
Jsition in the upper quartiles of a data set, which are the 25th and 75th percentiles,
below which respectively. If we denote these values by Q25 and Q75 respectively, the
er percentiles interquartile range <IQR) of a data set is given by
used as sum-
'S denoted M.
IQR = Q75 - Qzs (A.17)
e above. Like
larison of the
is skewed. If The interquartile range contains half the data values in a data set and is
• pulling the indicative of the range of values. The interquartilc range, as a measure of
: mversely, if data spread, is less affected by extreme values than are simpler measures
the median such as the range <the maximum value minus the minimum value) or even
the variance and standard deviation.
i example is A useful graphic that gives a good summary picture of a data set is a box
'or example, plot. A number of variations on the theme exist (so that it is important to
:e of populn- state the way in which any plot that you present is defined), but the
g right skew diagram on the left-hand side of Figure A.2 is typical. This plot summarizes
Jf of all the the same Florida percent African-American data as in Figure A.l. The box
;, The right itself is drawn to extend from the lower to the upper quartile value. The
rhe median horizontal line near the center of the box indicates the median value. The
is higher at whishers on the plot extend to the lowest and highest data values within
centages of one-and-a-half IQRs below Q25 and above Q75 . Any values beyond these
1edian. The limits, that is, less than Q25 - 1.5(1QR) or greater than Q75 + 1.5(1QR),
are regarded as outliers and marked individually with point symbols. If
either the minimum or maximum data value lies inside the 1.5 IQR limits,
the fences at the ends of the whiskers are drawn at the minimum or max-
imum value as appropriate. This presentation gives a good general picture

c: 50
Cl) 40 :~
"'o Afncan-
American f{Il··l -
~ 30
Fences :~
ill··t ....
~ 10
Bo•l M*~~~~~ %Wh1te
- foo·Dl~

0 20 40 60 80 100

% of population
/. Figure A.2 Box plots of Florida ethnicity data.

ofthe data distributi on. In the example illustrate d on the left, the minimum
value is greater than Q25 - 1.5(!QR), at around 2%, and is marked by the
lower fence. There are four outlier values above Q75 + 1.5<IQRJ, three at
around 40%, and one at around 55%. which is practi
Several data sets may be compared using parallel box plots. This has been Here are son
done on the right-han d side of Figure A.2, where Florida county data for iar. For an ev•
percent Hispanic and percent white have been added to the plot. Note that always 1:
box plots may also be drawn horizonta lly, as here. This example shows that
the African-A merican and Hispanic populatio n distributi ons are both right
skewed, although the typical variabilit y among African-A merican popula-
tions is greater. The white populatio n distributi on is left skewed, on the You can think '
other hand, and has higher typical values. from the fact tl
since it can't ..~
to remember, I
A.3. PROBA BILITY THEOR Y not constitute t
calculate PI NC
A great deal of statistics depends on the ideas of probabilit y theory. this is: What is
Probabili ty theory is a mathema tical way of dealing with unpredictable birthday? This
events. It enables us to assign probabilit ies to events on a scale from 0 late the oppos:
(will never happen) to 1 <will definitely happen). The most powerful aspect same birthday
of probabilit y theory is that it provides standard ways of calculatin g the only 365, then
probabili ty of complex composite events-f or example, A and B happening the year if th
when C does not happen-g iven estimates for the probabilit y of each of the previously. Th
individua l events A, B, and C happenin g on its own. as (365/366) x
In probabilit y theory, an event is a defined as a collection of observations that an) two s
in which we are interested . To calculate the probabilit y of an event, we first For two ever
enumerat e all the possible observati ons and count them. Then we determine P(A u Bl is giv
how many of the possible observati ons satisfy the condition s for the event
we are interested in to have occurred. The probabilit y of the event is the
number of outcomes that satisfy the event definition , divided by the total
number of possible outcomes . For example, the probabilit y of you winning where PtA n B
the big prize in a lottery is given by special case v
together, PtA 1
. number. of .ways you win
P ( lottery wm) = (A.18)
number of combmat10ns that could come up

Note that the notation PtA> is read as "the probabilit y of event A occurring." For example.
Since the number of possible ways that you can win <with one ticket> is 1, and B is the (
and the number of possible combinat ions of numbers that could be drawn mutually excl•
is usually very large (in the UK national lottery, it is 13,983,816>. the neously. We h
probabilit y of winning the lottery is usually very small. In the UK national other hand, if
lottery it is B is as before
The Elements of Statistics 393

P(lot tery win) =

98~'816 = 0.000000071511 (A.19)

which is prac tical ly 0, mea ning that it prob ably

won' t be you.
Here are some basic prob abtli ty resu lts with
whic h you shou ld be famil-
iar. For an even t A and its com plem ent NOT
A, the total prob abili ty is
always 1:

P<A> + P<NOT A> = 1 (A.20)

You can thin k of this as the some thing mus t happ

en rule, beca use it follows
from the fact that a well-defined even t A will
eithe r happ en or not happ en,
since it can' t "sor t of" happ en. This rule is prob
ably obvious. but it is usefu l
to reme mbe r, beca use it is often easie r to
enum erate obse rvati ons that do
not cons titut e the even t of inter est occu rring than
thos e whic h do, that is, to
calcu late P<NOT A>. from whic h it is easy to dete
rmin e P<A >. An exam ple of
this is: Wha t is the prob abili ty of any two stud
ents in a class of 25 shar ing a
birth day? This is a hard ques tion until you reali
ze that it is easie r to calcu-
late the opposite prob abil ity-t hat no two stud
ents in the class shar e the
same birth day. Each stud ent after the first can
have a birth day on one of
only 365, then 364, then 363, and so on, of the
rema ining "unu sed'' days in
the year if they are not to shar e a birth day
with a stud ent cons idere d
previously. This gives the prob abili ty that no
two stud ents shar e a birth day
as (365 /366 ) "(364/366) x · . x (342 / 366)::::: 0.432
, so that the prob abili ty
that any two stud ents will shar e a birth day
is 1 - 0.432 = 0.568.
For two even ts A and B, the prob abili ty of eithe
r even t occu rring , deno ted
P<A U Bl is give n by

P<A u B> = P<A> + P<B> - P<A n B> (A.21)

whe re P<A n B> deno tes the prob abili ty of both

A and B occu rring . In the
special case whe re two even ts are mutu ally
exclusive and cann ot occur
toge ther, P(A n B> = 0, so that
(A. 18)
P<A u B) = P<A> + P<B> (A.22)

For exam ple, if A is the even t "dra wing a face

card from a deck of card s'',
and B is the even t "dra wing an ace from a deck
of card s," the even ts are
mutu ally exclusive, since a card cann ot be an
ace and a face card simu lta-
neously. We have P(A) - ~ , P(B) ={a. so thatP
othe r hand , if A is the even
(A UB) = 1J = 30.8%. On the
t "dra wing a red card from a deck of card s" and
B is as before, A and B are no long er mutu
ally exclusive, since there are

two red aces in the pack. The various probabilities are now P(A) = Calculatic
~~ - ~, P(B) = -fs, and P(A n B) = [2 = z\, so that P(A u Bl, the probability of
. a caz·d t h a t IS
d rawmg . l+ 1
. er re d or an ace IS I - 7 53 .80t.r. A very common
2 13 26 13
Conditional probability refers to the probability of events given certain number of possil
preconditions. The probability of A given B, written P<A : B) is given by situations. Perm
are arranged is r
P(AnB) from CBA. Wher
P(A: B) (A.23) lent outcomes. T
combination. Th
This is obvious if you think about it. If B must happen, P<B> is proportional of n elements, w
to the number of all possible outcomes. Similarly, P<A n B) is proportional to
the number of events that count, that is, those where A has occurred given
that B has also occurred. Equation <A.23) then follows as a direct conse-
quence of our definition of probability.
A particularly important concept is event independence. Two events are
independent if the occurrence of one has no effect at all on the likelihood of where x! deno
the other. In this case, (X- 2) X··· X 3
sion for the nur
from a set of n E
P(A : B) = P(A)
P<B : A> - P(B)

Inserting the first of these into equation (A.23), gives us

These two expre
PA _ P(A n B) will use the corr
( )- P(B) (A.25)
tribution as!'OCil

which we can rearrange to get the important result for independent events
A and B that Word o
The power of pr
PcA n B)= P<A>P<B> (A.26) of events in ave
The problem lie
This result is the basis for the analytic calculation of many results for repeatable obsc
complex probabilities and one reason why the assumption of el'ent the flipping of 1
independence is often important in statistics. As an example of independent comes baged on
events, think of two dice being rolled simultaneously. If event A is "die 1 probability of h
comes up a six" and event B is "die 2 comes up a six,'' the events are many flips of a c
independent, since the outcome on one die can have no possible effect on don't match, w1
the outcome of the other. Thus the probability that both dice come up six is particular cone·
P(A)P(B) = t xi= :Js = 2.78°o. the term:
The Elements of Statistics 395

Calc ulati on of Perm utati ons and Com bina tions

A very common requir ement in probab ility calcula tions is to determ
ine the
numbe r of possible permu tations or combin atiOns of n elemen ts in
variou s
situati ons. Permu tations are sets of elemen ts where the order in which
'lre aJTanged is regard ed as signifi cant. so that ABC is regard ed as
differe nt
from CBA. When we arc counti ng combi nation s, ABC and CBA are
IA.23) lent outcomes. There are actual ly six permu tations of these three
elemen ts:
ABC, ACB, BAC, BCA, CAB, and CBA, but they all count as
only one
combination. The numbe r of permu tations of ll elemen ts taken from
a set
of n elemen ts, withou t replac ement , is given by

P" - n! (A.27)
* (n /l)!

C\'Cnts a re
where x! denote s the factori al of x and is given by x"
likelihood of (x 1)
(x- 2) x .. · x 3 x 2 x 1, and 0! is define d equal
to 1. The equiva lent expres-
sion for the numbe r of combi nation s of ll elemen ts, which may
be chosen
from a set of n elemen ts is
(A .24)
C" _
(n) _ n!
lz -ll!(n -k)!

These two expres sions turn out to be impor tant in many situati ons,
and we
will use the combi nation s expres sion to derive the expect ed freque
(A.25) ncy dis-
tributi on associ ated with comple te spatia l random ness (see Chapt
er 3>.

Wor d of War ning abou t P roba bility Theo ry
The power of probab ility theory comes at a price: We have to learn
to think
I A .26) of events in a very particu lar way, a way that may not always be applica
The proble m lies in the fact that probab ility theory works best in
a world of
repeat able observ ations, the classic examp les being the rolling
of dice and
the flipping of coins. In this world we assign definit e probab ilities
to out-
comes based on simple calcula tions (the probab ility of rolling a six
is !, the
probab ility of heads is~> and over repeat ed trials (many rolls of the
die, or
many flips of a coin) we expect outcom es to match these calcula tions.
If they
don't match , we suspec t a loaded die or unfair coin. In fact, this
is a very
particu lar concep t of probability. There are at least three distinc
t uses of
the term:

1. A priori or theoretical, where we can precisely calculate probabil- When A can'

ities ahead, based on the "physics." This is the probabilit y asso- variable is disc1
ciated with dice, coins, and cards. 10 rolls of a die
2. A posteriori probabili ty is often used in geography . The assumptio n times, and so <
is that historical data may be projected forward in time in a pre- Where A can a:
dictive way. When we go on a trip and consult charts of average continuou!'. A s
July temperat ures in California , we a re using this type of probabil- be regarded as
ity in an informal way. cific height in a
3. Subjectit•e probabili ty is more about hunches and gue~~work. "The student m1ght 1
Braves have a 10c< chance of winning the World Series this year," there are an in
"Middlesb rough has a 10f chance of winning the FA Cup this Many observati
season," or whatever . mathemat ically
which are freq
the sections th;
There are, however, no hard-and- fast rules for distinguis hing these dif-
ferent "flavors" of probabilit y.
In the real world, especially in social science, data are once-off and obser-
vational, with no opportun ity to conduct repeated trials. In treating sample
observati ons as typical of an entire populatio n, we make some important The binomial di
assumpti ons about the nature of the world and of our observatio ns. in series of trials !
particula r that the world is stable between observati ons and that our obser- in each individ1
vations are a represent ative (random> sample. There are many cases where her of occu1-ren•
this cannot be true. The assumpti ons are especially dubious where data are ability distribu·
collected for a localized area, because then the sample is only representa tive n times. Here t
locally, and we must be careful about claims we make based on statistical

and the binomi

The binomial
Probabili ty theory forms a basis for calculatio n of the likely outcomes of
processes . A process may be summariz ed by a random variable. Note that
this docs not imply that a process is random, just that its outcomes may be
modeled as if it were. A random variable is defined by a set of possible
outcomes {a 1.. . a, .... a,} and an associated set of probabtlit ies
{P(a 1 ) ..... P(a;) .... . P(a 11 )}. The random variable is usually denoted by a where p is the 1
capital letter, say A, and particula r outcomes by lowercase letters a,. We trials, and the
then write P<A =a,) = 0.25 to denote the probabilit y that the outcome of A between 0 and
is a 1• rolls of a die, v
The Elements of Statistics 397

e probabil- When A can assume one of a countable number of outcomes. the random
ility asso- variable is discrete. An example is the number of times we throw a six in
10 rolls of a die: the only possibilities are none, one time, two times, three
lssumption times, and so on, up to 10 times. This is 11 possible outcomes in total.
e in a pre- Where A can assume any value over some range, the random variable is
of average continuous. A set of measurement s of the height of students in a class can
>f probabil- be regarded as a continuous random variable, since potentially any spe-
cific height in a range from, say, 1.2 to 2.4 m might be recorded. Thus, one
work. "The student might be 1.723 m tall, while anoth er could be 1.7231 m tall, and
this year," there are an infinite number of exact measurements that could be made.
l Cup this Many observational data sets are approximated well by a small number of
mathematical ly defined random variables, or probability distributions,
which arc frequently used as a result. Some of these are discussed in
the sections that follow.
ng these dif-

>ff and obser-

ating sample The Binomial Distributi on
ne important The binomial distribution is a discrete random variable that applies when a
ervations, in series of trials are conducted where the probability of some event occurring
tat our ob~er­ in each individual trial is known. and the overall probability of some num-
, cases where ber of occurrences of the event is of interest. A typical example is the prob-
ere data are ability distribution associated with throwing x "sixes" when throwing a die
presentative n times. Here the set of possible outcomes is
on statistical

A = {0 sixes, 1 si.x, 2 sixes. ... . n sixes) (A.29)

and the binomial probability distribution will tell us what the probability is
J\BLES of getting a specified number of sixes.
outcomes of The binomial probability distribution is given by
le. Note that
mcs may be
~t of possible
enoted by a
wherep is the probability ofthe outcome ofinterest in each trial, there are n
ttcrs a, . We
trials, and the outcome of interest occurs x times. x may take any value
outcome of A
between 0 a nd n. For example, for the probability of getting two sixes in five

rolls of a die, we have n = 5, x = 2. and p = so that

Applying th€
P(two sixes)= ( 5)(1)2(1- 1)5 2
throwing ad
2 6 6
5! ) (1)6 (5)6
= (2!3! X


5 X 4 X 3 X 2 X 1) (1 1) (5 5 5)
= ( (2 X 1)(3 X 2 X 1) X 6X 6 X 6X 6 X 6 (A.31) with a standi

= (~~0) (3~) G~~)


93,312 Note that we
= 0.16075 long-run ave
many times.
Figure A.3 shows the probabilities of the different numbers of sixes for five
rolls of a die. You can see that it is more likely that we will roll no sixes or
only one six than that we will roll two.
For statistical purposes it is useful to know the mean, variance, and The Poisson
standard deviation of a random variable. For a binomial random variable rences of an
these are given by fixed time pe
is the averag
in each unit)
J1 - np
observing x c
a 2 = np(1 - p) (A.32)
a = Jrlp(1p)

which has pa

Binomial distribution
= =
n 5, p 0 166667

:0 0.3 Figure A.4 s
e 02 This distribt
ll. .---
Chapters 3 a

0 .r-1
0 2 3 • 5
Number of sixes
Figure A.3 Histogram showing the probabilities of rolling different numbers of sixes Both the binc
for five rolls of a die. the meanin~
The Elements of Statistics 399

Applying these results we find that the mean or expected value when
throwing a die 5 times and counting 'sixes' is

/l = np = 5 x -61 = 0.8333 (A.33)

(A.3l) with a standard deviation of

[np(l 6 - 0.8333 <A.34)

Note that we would never actually observe 0.833 six. Rather, this is the
long-run average that we would expect if we conducted the experiment
many times.

The Poisson Distribution

and The Poisson distribution is useful when we observe the number of occur-
rences of an event in some fixed unit of area, length, or volume or over a
fixed time period. The Poisson distribution has only one parameter A, which
is the average intensity of events (i.e., the mean number of events expected
in each unit). This is usually estimated from the sample. The probability of
observing x events in one unit is given by

which has parameters


Figure A.4 shows the probabilities for a Poisson distribution with A = 2.

This distribution is important in the analysis of point patterns lsee
Chapters 3 and 4 1.

Continuous Random Variables

Both the binomial and Poisson distributions are discrete variables, in which
the meaning of the probability assigned to any particular outcome is

Poisson distribution
A.= 2


:0 0.3


Figure A.5 Uni

0 1 2 3 4 5 6 7 8 9 10 is given by the <
Number of events
in unit
Figure A.4 Histogram of the Poisson distribution for A = 2. recorded in st
quently encoUJ
dom variables
obvious. In the continuous case it is less so. For example, the chance that
any student in a class will have a height of precisely 175.2 em is very small.
almost zero, in fact. We can only speak of a probability that a measurement
will lie in some range of values. Continuous random variables are therefore
defined in terms of a probability denstty function, which enables the calcu- It is unlikely tJ
lation of the probability that a value between given limits will be observed. more likely to
Figure A.6. A
its mean I' an<
T he U niform Distribution given by

In the uniform distribution, every outcome is equally likely over the range
of possible outcomes. If we knew that the shortest student in a class was
160 em tall, and the tallest 200 em, and we thought that heights were
uniformly distributed, we would have the continuous uniform distribution
shown in Figure A.5. As shown in the diagram, the probability that a stu-
dent's height is between any particular pair of values a and b is given by the
area under the line between these values. Mathematically, this is expressed

x b
P(a.::: x.::: b)=
Jx=o f<x)dx <A.37l

where f!xl is the probability density function. The units for the probability
density therefore depend on the measurement units for the variable, and
the area under the line must always total to 1 since something must occur
with certainty. This is the only time you will even sec an integration <Jl
symbol in this book. The calculus to determine the area under standard
continuous probability functions has already been done by others, and is
The Elements of Statistics 401

8 b 200

Height (centimeters)
Figure A.5 Uniform distribution. The probability of a measurement between a and b
is given by the area of the shaded rectangle.

recorded in statistical tables. In the next sections. two of the most fre-
quently encountered and therefore most completely defined continuous ran-
dom variables are described.
chance thnt
very small,
easurcmcnt The Normal Distribution
are therefore
It is unlikely that student heights are distributed uniformly. They are much
es the calcu-
more likely to approximate to a normal distribution. This is illustrated in
be observed.
Figure A.6. A particular normal distribution is defined by two parameters:
its mean Jt and its standard deviation a. The probability density function is
given by

er the range
a class was N(X.Jl, a) = -1-
exp [ -(x-- -2
rd] (A.38)
eights were
y that a stu- 05

given by the
is expres:,;ed 04
Q) 03

(A.37) ~
~ 02
e probability 01

ariable. and
must occur
tegration ! p -4 ·3 ·2 ·1 0 2 3 4

er standard Z-score
Figure A.6 Normal distribution .

where x is a particular value that the variable might take. The standardized would not be praC1
form of this equation for a normal distribution with mean ofO and standard they intended to v
deviation of 1 is denoted NCO, 1) and is given by tion. Instead, poll
way they intend t
N(z. 0. 1) =
1 -z~ /2
tn: e (A.39) etc.) from their SaJ
v 2rr for the entire popt
If we imagine t
which shows how the probability of a normally distributed variable falling calculating (say) a
in any particular range can be determined from its z-score alone. Tables of different estimate
the normal distribution are widely available that make this calculation If we record the 1
simple. This, together with the central limit theorem (see Section A.5l, is numerous estimat(
the reason for the importance ofthis distribution in statistical analysis. It is sampling distribr4
useful to know that 68.3C(. of the area under the normal curve lies within the sampling distr,
one standard deviation, 95.5% within two standard deviations, and 99.7%
within three standard deviations of the distribution mean.
The Exponential Distribution The central limit tJ
how good an estim
Many natural phenomena follow an approximately exponential distribu- sample of a partict
tion. A good example is the lengths of time between catastrophic events a random sample c
(earthquakes, floods of given severity). The formulas for the exponential standard deviation
distribution are of a is normal with
e xfH
f(x) = -

where t1 is a constant parameter that defines the distribution. The prob-

ability that a value higher than any particular value will be observed is A number of poin
conveniently calculated for the exponential distribution, according to
• The distributior
P(x 2: a) = e -afO (A.41)
For almost any
that the sampliJ
standard deviat
HYPOTHESIS TESTING equal chance of
• Since <T0 = a I Jn
We now come to one of the key ideas in statistics. A set of observations is closer to the act
often a sample of the population from which it is drawn. A voter survey is a However, becaw
good example. Often, a sample is the only feasible way to gather data. It to increase our s

You might also like