The Elements Statistic: Appendix A
The Elements Statistic: Appendix A
The Elements Statistic: Appendix A
lowercas
letter of
More oft
The Elements of Statistics thenuml
the comn
ters are ;
generally
you will n
reason for
vectors at
A famili
alphabet~
384
The Elements of Slattsllcs 385
1 those you Table A.1 Commonly Used Symbols and Their Meaning in This Book
· used here,
S.vmbol Meaning
is book may
~ start with x The "casting" geographical coordinate or a general data value
put you off, y The "northing'' geographical coordinate or a general data value
;is arc di ffi· z. a, b The numerical value of some measurement recorded at the
~refore. you geographical coordinates lx, y)
ips with the 11, m The number of observations in a data set
the spatial k Either an arbitrary constant or, sometimes, the number of entities
in a spatial neighborhood
10ks simply
d Distance
ghtly differ· w The strength or weight of interaction between locations
e principle~ s An arbitrary <x. y> location
386 GEOGRAPHIC INFORMATION ANALYSIS
values, we need another notational device: subscripts. Subscripts are small which is usua
italic letters or numbers below and to the right of normal mathematical
symbols: The i in z; is a subscript. A subscript is used to signify that
there may be more than one item of the type denoted by the symbol, so z;
stands in for a series or set of z values: z 1, z 2 , z 3 , and so on. This has various
uses: or to
• A set of values is written between braces, so that lzt. z2, ... , Z 11 1> z,)
tells us that there are n elements in this set of z values. If required,
the set as a whole may be denoted by a capital letter: Z. A typical
value from the set Z is denoted z, and we can abbreviate the previous where the nu:
partial listing of z's to simply Z = (z,), where it is understood that the of the simple
set has n elements.
• In spatial analysis, it is common for the subscripts to refer to loca-
tions at which observations have been made and for the same sub-
scripts to be used across a number of different data sets. Thus, 1!1 and
t 1 refer to the values of two different observations-say, height and instead. Or p
temperature-at the same location (i.e., location number 7). of the produc1
• Subscripts may also be used to distinguish different calculations of
(say) the same statistic on different populations or samples. Thus, llA
and flB would denote the means of two different data sets, A and B.
The symbols i and j usually appear as subscripts in one or other of these In spatial an
ways. A particularly common usage is to denote summation operations, two sets of v
which are indicated by use of the L symbol (another Greek letter, this example,
time capital sigma). This is where subscripts come into their own, because
we can specify a range of values that are summed to produce a result. Thus,
the sum
(A.ll
indicates tha
is denoted in turn (the a
value in turr
setting i to 1
(A.2) We then set1
to L:, (Z11 -z1
vidual sums
indicating that summation of a set of a values should be carried out on all
complex at fi
the elements from a 1 to a 6 . For a set of n "a" values this becomes
In the nex
make it easy
(A3) set. Other el
explained in
The Elements of Statistics 387
/1
La•
1 I
(A.4 )
or to
.• , Zn I• Zn} (A.5)
If requ ired ,
Z. A typic al
the prev ious whe re the num ber ofva lues in the set of'a
's is unde rstoo d to ben . Ifin stea d
that the of the simp le sum we wan ted the sum of the
squa res of the a valu es we have
(A.6J
(A.l )
indi cate s that cis to be calc ulate d in two
stag es. Firs t, we take each z valu e
in turn (the oute r i subs crip t) and sum the
squa re of its valu e min us ever y z
valu e in turn <the j subs cript ). You can
figu re this out by imag inin g first
settm g i to 1 and calc ulati ng the inne r sum
, whic h wou ld be 1:1 (z 1 - z1 )?. .
IA.2 ) We then set i to 2 and do the sum mati on
I:1 (z2 - z1 ) 2 , and so on, all the way
to 1: (z, - z )2 • The final doub le sum mati
1 1 on is the sum of all of thes e indi-
vidu al sum s, and c is equa l to this sum mult iplie
d by k. This will seem
com plex at first , but you will get used to
it.
In the next secti on you will see imm edia
tely how thes e nota tion al tools
mak e it easy to writ e dow n oper ation s like
find ing the mea n valu e of a data
set. Oth er elem ents of nota tion will be intro
(A.3J duce d as they are requ ired and
expl aine d in the appe ndic es and main text.
388 GEOGRAPHIC INFORMATION ANALYSIS
In all these ex
Population Parameters an estimate o
These arc presented without comment. A populatton mean fl is given by symbols for n
statistics.
1 Sample sta
Jl = La, I
a= If;L r
n
I
(a, Ill
2
(All)
nient summa
tendency and
the data set.
These statistics are referred to as population parameters. indicative of ~
Sample Statistics
The statistics above are based on the entire population of interest, which The z-score o
may not be known. Most descriptive statistics are calculated for a sample of
the entire population. They therefore have two purposes: First, they are
summary descriptions of the sample data, and second. they serve as esti·
mates of the corresponding population parameters. The sample mean is and relati\'e
... 1
a- =Jl =
n
La, 1
/1
I
(A.121
1
:L
11
2
s =a= - - (a, - a> (A.14)
n - 1 i-1
In all these expressions the " ("hat") symbol indicates that the expression is
an estimate of the corresponding population parameter. Note the different
symbols for these sample statistics relative to the corresponding population
is given by
statistics.
Sample statistics may be used as unbiased estimators of the correspond-
(A.9) mg population parameters, and a major part of inferential statistics is con-
cerned with determining how good an estimate a sample statistic is of the
corresponding population parameter. Note particularly the denominator
(n - 1) in the sample variance and standard deviation statistics. This
reflects the loss of one degree of freedom !dfl in the sample statistic case
because we know that L: <a;- a>= o, so that given the values of a and
(A.lO)
a 1 • ·a,. t. the value of a 11 is known. The expressions shown, using the
In - ll denominator, are known to produce better estimates of the corre-
sponding population parameters than those obtained using n.
Together, the mean and variance or standard deviation provide a conve-
nient summary description of a data set. The mean is a measure of central
(A.ll) tendency and is a typical value somewhere in the middle of all the values in
the data set. The variance and standard deviation are measures of sp1·ead,
indicative of how dispersed are the values in the data set.
Z-Scores
The z-score of a value a, relative to its population is given by
terest, which
a sample of a ; -llA (A.15l
they are
Z; = (1
a; -a (A.16)
Z;=--
(A.12) S
The z-score indicates the place of a particular value in a data set relative to
the mean, standardized with respect to the standard deviation. z = 0 is
equivalent to the sample mean, z > 0 is a value greater than the mean
!A.l3> and z < 0 is less than the mean. The z-score is used extensively in determin-
ing confidence intervals and in assessing statistical significance.
390 GEOGRAPHIC INFORMATION ANALYSIS
median often g
Median, Percentile s, Quartiles , and Box Plots
a data set.
Other descriptive statistics are based on sorting the values in a data set into Two other p
numerical order and describing them according to their position in the upper quartile
ordered list. The first percentile in a data set is the value below which respectively. I
1'1, and above which 99'1 of the data values are found. Other percentiles interquartile r
are defined similarly, and certain percentiles are frequently used as sum-
mary statistics. The 50th percentile is the median, sometimes denoted M.
Half the values in a data set are below the median and half are above. Like
the mean, the median is a measure of central tendency. Comparison of the The interqu.
mean and median may indicate whether or not a data set is skewed. If indicative oft
a > MA, this indicates that high values in the data set are pulling the data spread, i
mean above the median; such data are right shewed. Conversely, if such as the ra
a < MA, a few low values may be 'pulling' the mean below the median the variance li
and the data are left sllewed.
A useful gr~:
Skewed data sets are common in human geography. A good example is plot. A numbE
often provided by ethnicity data in administrativ e districts. For example. state the wa~
Figure A.1 is a histogram for the African-Amer ican percentage of popula- diagram on th
tion in the 67 Florida counties as estimated for 1999. The strong right skew the same Flor
in these data is illustrated by the histogram, with almost half of all the itself is drawl
counties having African-Amer ican populations of 10<"c or less. The right horizontal lin
skew is confirmed by the mean and median of these data. The median wluskers on t
percent African-Amer ican is 11.65%, whereas the mean value is higher at one-and-a-hal
14.17% and the small numbers of counties with higher percentages of limits, that i:
African-Amer icans pull the mean value up relative to the median. The are regarded
either the mil
the fences at
Florida by c ounty imum value E
Percent African-American
30 n =67
25 -
c: 50
20
~
·c:
c::::> Q) 40
15
u
0
~
10
C:
r3
30
r
Fenc
I-- ;E 20
5
0 10 20
1----1
30 <40 50 60
<(
~
0 10
0
l
Percent African-American
ox Plots median often gives a better indication of what constitutes a typical value in
a data set.
a data set into Two other percentiles are frequently reported. These are the lower and
Jsition in the upper quartiles of a data set, which are the 25th and 75th percentiles,
below which respectively. If we denote these values by Q25 and Q75 respectively, the
er percentiles interquartile range <IQR) of a data set is given by
used as sum-
'S denoted M.
IQR = Q75 - Qzs (A.17)
e above. Like
larison of the
is skewed. If The interquartile range contains half the data values in a data set and is
• pulling the indicative of the range of values. The interquartilc range, as a measure of
: mversely, if data spread, is less affected by extreme values than are simpler measures
the median such as the range <the maximum value minus the minimum value) or even
the variance and standard deviation.
i example is A useful graphic that gives a good summary picture of a data set is a box
'or example, plot. A number of variations on the theme exist (so that it is important to
:e of populn- state the way in which any plot that you present is defined), but the
g right skew diagram on the left-hand side of Figure A.2 is typical. This plot summarizes
Jf of all the the same Florida percent African-American data as in Figure A.l. The box
;, The right itself is drawn to extend from the lower to the upper quartile value. The
rhe median horizontal line near the center of the box indicates the median value. The
is higher at whishers on the plot extend to the lowest and highest data values within
centages of one-and-a-half IQRs below Q25 and above Q75 . Any values beyond these
1edian. The limits, that is, less than Q25 - 1.5(1QR) or greater than Q75 + 1.5(1QR),
are regarded as outliers and marked individually with point symbols. If
either the minimum or maximum data value lies inside the 1.5 IQR limits,
the fences at the ends of the whiskers are drawn at the minimum or max-
imum value as appropriate. This presentation gives a good general picture
·~
c: 50
~
·c:
Cl) 40 :~
Outl>efos
"'o Afncan-
American f{Il··l -
E
~ 30
c:
~
·c:
~---:
Fences :~
%Hispanic
ill··t ....
20
~
~ 10
Bo•l M*~~~~~ %Wh1te
- foo·Dl~
~
0
0 20 40 60 80 100
% of population
/. Figure A.2 Box plots of Florida ethnicity data.
392 GEOGRAPHIC INFORMA TION ANALYSIS
ofthe data distributi on. In the example illustrate d on the left, the minimum
value is greater than Q25 - 1.5(!QR), at around 2%, and is marked by the
lower fence. There are four outlier values above Q75 + 1.5<IQRJ, three at
around 40%, and one at around 55%. which is practi
Several data sets may be compared using parallel box plots. This has been Here are son
done on the right-han d side of Figure A.2, where Florida county data for iar. For an ev•
percent Hispanic and percent white have been added to the plot. Note that always 1:
box plots may also be drawn horizonta lly, as here. This example shows that
the African-A merican and Hispanic populatio n distributi ons are both right
skewed, although the typical variabilit y among African-A merican popula-
tions is greater. The white populatio n distributi on is left skewed, on the You can think '
other hand, and has higher typical values. from the fact tl
since it can't ..~
to remember, I
A.3. PROBA BILITY THEOR Y not constitute t
calculate PI NC
A great deal of statistics depends on the ideas of probabilit y theory. this is: What is
Probabili ty theory is a mathema tical way of dealing with unpredictable birthday? This
events. It enables us to assign probabilit ies to events on a scale from 0 late the oppos:
(will never happen) to 1 <will definitely happen). The most powerful aspect same birthday
of probabilit y theory is that it provides standard ways of calculatin g the only 365, then
probabili ty of complex composite events-f or example, A and B happening the year if th
when C does not happen-g iven estimates for the probabilit y of each of the previously. Th
individua l events A, B, and C happenin g on its own. as (365/366) x
In probabilit y theory, an event is a defined as a collection of observations that an) two s
in which we are interested . To calculate the probabilit y of an event, we first For two ever
enumerat e all the possible observati ons and count them. Then we determine P(A u Bl is giv
how many of the possible observati ons satisfy the condition s for the event
we are interested in to have occurred. The probabilit y of the event is the
number of outcomes that satisfy the event definition , divided by the total
number of possible outcomes . For example, the probabilit y of you winning where PtA n B
the big prize in a lottery is given by special case v
together, PtA 1
. number. of .ways you win
P ( lottery wm) = (A.18)
number of combmat10ns that could come up
Note that the notation PtA> is read as "the probabilit y of event A occurring." For example.
Since the number of possible ways that you can win <with one ticket> is 1, and B is the (
and the number of possible combinat ions of numbers that could be drawn mutually excl•
is usually very large (in the UK national lottery, it is 13,983,816>. the neously. We h
probabilit y of winning the lottery is usually very small. In the UK national other hand, if
lottery it is B is as before
The Elements of Statistics 393
two red aces in the pack. The various probabilities are now P(A) = Calculatic
~~ - ~, P(B) = -fs, and P(A n B) = [2 = z\, so that P(A u Bl, the probability of
. a caz·d t h a t IS
d rawmg . l+ 1
. et.th er re d or an ace IS I - 7 53 .80t.r. A very common
2 13 26 13
Conditional probability refers to the probability of events given certain number of possil
preconditions. The probability of A given B, written P<A : B) is given by situations. Perm
are arranged is r
P(AnB) from CBA. Wher
P(A: B) (A.23) lent outcomes. T
P<B>
ABC, ACB, BAC
combination. Th
This is obvious if you think about it. If B must happen, P<B> is proportional of n elements, w
to the number of all possible outcomes. Similarly, P<A n B) is proportional to
the number of events that count, that is, those where A has occurred given
that B has also occurred. Equation <A.23) then follows as a direct conse-
quence of our definition of probability.
A particularly important concept is event independence. Two events are
independent if the occurrence of one has no effect at all on the likelihood of where x! deno
the other. In this case, (X- 2) X··· X 3
sion for the nur
from a set of n E
P(A : B) = P(A)
(A.24)
P<B : A> - P(B)
which we can rearrange to get the important result for independent events
A and B that Word o
The power of pr
PcA n B)= P<A>P<B> (A.26) of events in ave
The problem lie
This result is the basis for the analytic calculation of many results for repeatable obsc
complex probabilities and one reason why the assumption of el'ent the flipping of 1
independence is often important in statistics. As an example of independent comes baged on
events, think of two dice being rolled simultaneously. If event A is "die 1 probability of h
comes up a six" and event B is "die 2 comes up a six,'' the events are many flips of a c
independent, since the outcome on one die can have no possible effect on don't match, w1
the outcome of the other. Thus the probability that both dice come up six is particular cone·
P(A)P(B) = t xi= :Js = 2.78°o. the term:
The Elements of Statistics 395
P" - n! (A.27)
* (n /l)!
C\'Cnts a re
where x! denote s the factori al of x and is given by x"
likelihood of (x 1)
(x- 2) x .. · x 3 x 2 x 1, and 0! is define d equal
to 1. The equiva lent expres-
sion for the numbe r of combi nation s of ll elemen ts, which may
be chosen
from a set of n elemen ts is
(A .24)
C" _
k-
(n) _ n!
lz -ll!(n -k)!
(A.28)
These two expres sions turn out to be impor tant in many situati ons,
and we
will use the combi nation s expres sion to derive the expect ed freque
(A.25) ncy dis-
tributi on associ ated with comple te spatia l random ness (see Chapt
er 3>.
events
Wor d of War ning abou t P roba bility Theo ry
The power of probab ility theory comes at a price: We have to learn
to think
I A .26) of events in a very particu lar way, a way that may not always be applica
ble.
The proble m lies in the fact that probab ility theory works best in
a world of
repeat able observ ations, the classic examp les being the rolling
of dice and
the flipping of coins. In this world we assign definit e probab ilities
to out-
comes based on simple calcula tions (the probab ility of rolling a six
is !, the
probab ility of heads is~> and over repeat ed trials (many rolls of the
die, or
many flips of a coin) we expect outcom es to match these calcula tions.
If they
don't match , we suspec t a loaded die or unfair coin. In fact, this
is a very
particu lar concep t of probability. There are at least three distinc
t uses of
the term:
396 GEOGRAPHIC INFORMA TION ANALYSIS
e probabil- When A can assume one of a countable number of outcomes. the random
ility asso- variable is discrete. An example is the number of times we throw a six in
10 rolls of a die: the only possibilities are none, one time, two times, three
lssumption times, and so on, up to 10 times. This is 11 possible outcomes in total.
e in a pre- Where A can assume any value over some range, the random variable is
of average continuous. A set of measurement s of the height of students in a class can
>f probabil- be regarded as a continuous random variable, since potentially any spe-
cific height in a range from, say, 1.2 to 2.4 m might be recorded. Thus, one
work. "The student might be 1.723 m tall, while anoth er could be 1.7231 m tall, and
this year," there are an infinite number of exact measurements that could be made.
l Cup this Many observational data sets are approximated well by a small number of
mathematical ly defined random variables, or probability distributions,
which arc frequently used as a result. Some of these are discussed in
the sections that follow.
ng these dif-
and the binomial probability distribution will tell us what the probability is
J\BLES of getting a specified number of sixes.
outcomes of The binomial probability distribution is given by
le. Note that
mcs may be
(A.30l
~t of possible
probabilities
enoted by a
wherep is the probability ofthe outcome ofinterest in each trial, there are n
ttcrs a, . We
trials, and the outcome of interest occurs x times. x may take any value
outcome of A
between 0 a nd n. For example, for the probability of getting two sixes in five
t·
rolls of a die, we have n = 5, x = 2. and p = so that
398 GEOGRAPHIC INFORMATION ANALYSIS
Applying th€
P(two sixes)= ( 5)(1)2(1- 1)5 2
throwing ad
2 6 6
5! ) (1)6 (5)6
= (2!3! X
2
X
3
5 X 4 X 3 X 2 X 1) (1 1) (5 5 5)
= ( (2 X 1)(3 X 2 X 1) X 6X 6 X 6X 6 X 6 (A.31) with a standi
15,000
93,312 Note that we
= 0.16075 long-run ave
many times.
Figure A.3 shows the probabilities of the different numbers of sixes for five
rolls of a die. You can see that it is more likely that we will roll no sixes or
only one six than that we will roll two.
For statistical purposes it is useful to know the mean, variance, and The Poisson
standard deviation of a random variable. For a binomial random variable rences of an
these are given by fixed time pe
is the averag
in each unit)
J1 - np
observing x c
a 2 = np(1 - p) (A.32)
a = Jrlp(1p)
which has pa
Binomial distribution
= =
n 5, p 0 166667
05
~
:0 0.3 Figure A.4 s
(0
D
e 02 This distribt
ll. .---
Chapters 3 a
01
0 .r-1
0 2 3 • 5
Number of sixes
Figure A.3 Histogram showing the probabilities of rolling different numbers of sixes Both the binc
for five rolls of a die. the meanin~
The Elements of Statistics 399
Applying these results we find that the mean or expected value when
throwing a die 5 times and counting 'sixes' is
5
[np(l 6 - 0.8333 <A.34)
Note that we would never actually observe 0.833 six. Rather, this is the
long-run average that we would expect if we conducted the experiment
many times.
<A.36)
Poisson distribution
A.= 2
0.5
04
~
:0 0.3
m
.0
e
a.
0.2
0.1
In the uniform distribution, every outcome is equally likely over the range
of possible outcomes. If we knew that the shortest student in a class was
160 em tall, and the tallest 200 em, and we thought that heights were
uniformly distributed, we would have the continuous uniform distribution
shown in Figure A.5. As shown in the diagram, the probability that a stu-
dent's height is between any particular pair of values a and b is given by the
area under the line between these values. Mathematically, this is expressed
as
x b
P(a.::: x.::: b)=
Jx=o f<x)dx <A.37l
where f!xl is the probability density function. The units for the probability
density therefore depend on the measurement units for the variable, and
the area under the line must always total to 1 since something must occur
with certainty. This is the only time you will even sec an integration <Jl
symbol in this book. The calculus to determine the area under standard
continuous probability functions has already been done by others, and is
The Elements of Statistics 401
1so
8 b 200
Height (centimeters)
Figure A.5 Uniform distribution. The probability of a measurement between a and b
is given by the area of the shaded rectangle.
recorded in statistical tables. In the next sections. two of the most fre-
2.
quently encountered and therefore most completely defined continuous ran-
dom variables are described.
chance thnt
very small,
easurcmcnt The Normal Distribution
are therefore
It is unlikely that student heights are distributed uniformly. They are much
es the calcu-
more likely to approximate to a normal distribution. This is illustrated in
be observed.
Figure A.6. A particular normal distribution is defined by two parameters:
its mean Jt and its standard deviation a. The probability density function is
given by
er the range
a class was N(X.Jl, a) = -1-
affrr
exp [ -(x-- -2
2a
rd] (A.38)
eights were
distribution
y that a stu- 05
given by the
is expres:,;ed 04
~
·c;;
c:
Q) 03
"0
(A.37) ~
:0
~ 02
e
Cl.
e probability 01
ariable. and
must occur
tegration ! p -4 ·3 ·2 ·1 0 2 3 4
er standard Z-score
Figure A.6 Normal distribution .
402 ANALYSIS
where x is a particular value that the variable might take. The standardized would not be praC1
form of this equation for a normal distribution with mean ofO and standard they intended to v
deviation of 1 is denoted NCO, 1) and is given by tion. Instead, poll
way they intend t
N(z. 0. 1) =
1 -z~ /2
tn: e (A.39) etc.) from their SaJ
v 2rr for the entire popt
If we imagine t
which shows how the probability of a normally distributed variable falling calculating (say) a
in any particular range can be determined from its z-score alone. Tables of different estimate
the normal distribution are widely available that make this calculation If we record the 1
simple. This, together with the central limit theorem (see Section A.5l, is numerous estimat(
the reason for the importance ofthis distribution in statistical analysis. It is sampling distribr4
useful to know that 68.3C(. of the area under the normal curve lies within the sampling distr,
one standard deviation, 95.5% within two standard deviations, and 99.7%
within three standard deviations of the distribution mean.
T
The Exponential Distribution The central limit tJ
how good an estim
Many natural phenomena follow an approximately exponential distribu- sample of a partict
tion. A good example is the lengths of time between catastrophic events a random sample c
(earthquakes, floods of given severity). The formulas for the exponential standard deviation
distribution are of a is normal with
e xfH
f(x) = -
H
(A.40)
and