02 Hurlburt CH 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Chapter

2 Variables and Their Measurement

2.1 Levels of Measurement


2.2 Continuous and Discrete Variables
Real Limits • Significant Figures • Rounding

2.3 Summation
Notation • Computations

2.4 Connections
Cumulative Review • Computers • Homework Tips

Exercises for Chapter 2

Personal Trainer
Lectlet 2A: Variables and Their Measurement
Labs: Lab for Chapter 2
dataGen: Statistical Computational Package and Data Generator
reviewMaster 2A
Resource 2X: Additional Exercises

15

ch02.indd 15 6/7/17 4:02 PM


16 Chapter 2 Variables and Their Measurement

Learning Objectives
❶ What is a variable?
❷ What are the characteristics of the three measurement scales (nominal, ordinal,
and interval/ratio) used to measure variables? Why do we lump interval and ratio
variables together?
❸ What is the difference between a continuous and a discrete variable?
❹ What are the real limits of a measurement?
❺ What are the three parts of the rule about rounding?
❻ How is summation notation used to simplify the communication about sums?

A
“variable” can take on any of several values. There are three kinds of
variables (nominal, ordinal, and interval/ratio) based on their level of
measurement.
Most statistical computation involves summing values; the shorthand indication
for a sum is ∑.
variable Recall Pygmalion: Rosenthal and Jacobson (1968) led teachers to believe that the
measurement
nominal
second-graders identified as “bloomers” would show IQ spurts but the other children
ordinal would not spurt. There are three variables: the pretest IQ, the posttest IQ, and the intel-
interval/ratio lectual growth (IQ gain). All three are interval/ratio variables.
∑ (SIG∙ma) = “sum of”
Xi (EX∙sub∙EYE)

S
∑(2X — 4)2 = “sum of
two X minus four tatistics can be thought of as the science of understanding data; data are the
that quantity squared” results of a series of measurements on one or more variables. Let us be clear
DataGen (DAY∙ta∙jen)
about what these terms mean.
variable A characteristic A variable is any characteristic of the world that can be measured and that can take
that can take on several or on any of several or many different values. For example, height is a variable (defined
many different values
as the number of inches a person is tall). It takes on the value 70 inches when John is
measured and 64 inches when Mary is measured.
Values of a variable may sometimes be assigned arbitrarily; for example, biologi-
cal sex is a variable to which we may assign the values 1 for female and 2 for male (or
2 for female and 1 for male, or 27 for female and 136 for male, or any other values we
choose).
measurement The Measurement is the procedure that assigns values to the variable. For our purposes,
procedure for assigning a it is enough to realize that a measurement rule must provide a unique and unambiguous
value to a variable
result for every individual. Thus, in our sex example, although it makes no difference
what values are assigned to female and male, the assignment must be the same for all
females and for all males. It is not satisfactory measurement to begin by assigning 1 to
females and then, halfway through our data collection, change our minds and begin
assigning 27.
Personal Trainer Click Lectlets and then 2A in the Personal Trainer for an audiovisual discussion
of Sec­tions 2.1 through 2.4.
Lectlets

ch02.indd 16 6/7/17 4:02 PM


2.1 Levels of Measurement 17

2.1 Levels of Measurement


Statisticians distinguish three kinds of variables and measurement levels: nominal,
ordi­nal, and interval/ratio. The main characteristics of these kinds of variables are
given in Table 2.1.
nominal scale Classification The nominal scale of measurement classifies objects into categories based on
of unordered variables some characteristic of the object. Examples of nominal measurements of people are
male/female, Republican/Democrat/Independent/decline to state, fraternity member/
nonmember, and Californian/Ohioan/Nebraskan/Alaskan/and so on. In each case, the
measurement is the placing of an individual into one of the categories in the measure-
ment scale. We require that all measurement operations be mutually exclusive; that is,
for example, a person cannot be both a Republican and a Democrat or both an Ohioan
and a Nebraskan.
The order of categories in a nominal variable is not important. For example, it
doesn’t matter from a measurement standpoint whether we refer to the sex distinction
as male/female or female/male. It may be useful to assign a numerical value to a nomi-
nal category; we might let male = 1 and female = 2. Doing that does not change the
fact that the order is irrelevant; it would have made just as much measurement sense to
let male = 2 and female = 1.
ordinal scale Measurement The ordinal scale of measurement classifies objects into mutually exclusive cat-
of variables that have an egories based on some characteristic of the object (as does the nominal level of
inherent natural order
measurement), and furthermore it requires that this classification have some inher-
ent, logical order. An example of ordinal measurement of people is class standing
(freshman/sophomore/junior/senior) because it is both mutually exclusive (you can’t
be both a freshman and a sopho­more) and ordered (sophomore is more advanced than
freshman). Other examples are class rank (first in class/second/third/etc.), grade in
course (A/B/C/D/F), and level of depression (not depressed/slightly depressed/moder-
ately depressed/severely depressed).
We may assign a numerical value to these categories, and if we do, the order of
these categories is important (unlike in nominal variables where order is irrelevant).
For example, it does make sense to assign freshman = 1, sophomore = 2, junior = 3,
and senior = 4, but it does not make sense to assign sophomore = 1, senior = 2, fresh-
man = 3, and junior = 4. Order is inherent in ordinal variables, and the values of the
variables must reflect that order.

TABLE 2.1 Characteristics of nominal, ordinal, and interval/ratio levels of measurement


Level of Measurement
Characteristic Nominal Ordinal Interval/Ratio
Categories are mutually exclusive. Yes Yes Yes
Categories have logical order. No Yes Yes
Equal differences in characteristic
No No Yes
imply equal differences in value.

ch02.indd 17 6/7/17 4:02 PM


18 Chapter 2 Variables and Their Measurement

interval/ratio scale A The interval/ratio level of measurement classifies objects into mutually exclusive
measurement scale for categories based on some characteristic of the object (as do the nominal and ordinal
ordered variables that has
equal units of measurement levels of mea­surement). It requires that this classification have some inherent, logi-
cal order (as does the ordinal level of measurement). Furthermore, it requires that the
width of all the categories be the same. An example of an interval/ratio variable is
temperature as measured in degrees Celsius (°C). When we require that the intervals
be equal, we mean, for example, that the temperature difference between 34°C and
35°C is the same as the difference between 77°C and 78°C. It follows that the distance
between nonconsecutive measurements must also be equal if the measured differences
are equal. For example, the temperature difference between 34°C and 37°C must be the
same as that between 75°C and 78°C because both differences are 3 degrees.

BOX 2.1 Ratio level of measurement


c
This box may be omitted
Some statisticians divide the interval/ratio level of measurement into two separate
without loss of continuity. levels, called the interval level and the ratio level. Their “interval” level is the same
as what we have labeled “interval/ratio.”
ratio scale An interval scale The ratio level of measurement has all the characteristics of the interval level,
of measurement that has a and fur­thermore it requires that the scale have a true zero point. Examples of ratio
true zero point
measurements are weight (expressed as number of pounds), height (number of
inches), and time (number of minutes). A true zero point means that the thing being
measured actually vanishes when the scale reads zero. Thus, for example, the vari-
able weight measures the heaviness of an individual; when the weight is 0 pounds,
there is in fact no weight.
Note that the existence of a true zero point on a scale does not imply the exis-
tence of an individual whose measurement is zero. There are no 0-pound humans,
for example. The existence of a true zero simply means that if there were an indi-
vidual with no weight, then the weight scale would read 0.
Temperature as measured in degrees Celsius is not a ratio scale because 0°C
does not mean the absence of heat; 0°C is instead a relatively arbitrary point, the
freezing point of pure water at sea level. The true zero of the Celsius scale (the
complete absence of heat) is actually –273°C. By contrast, the Kelvin scale of tem-
perature has absolute zero temperature as the 0 K point of the scale (thus making the
freezing point of water +273 K). The Kelvin scale is therefore a ratio scale.
For most statistical procedures (including all those described in this book),
there is no difference between the interval and the ratio levels of measurement.
Therefore, we will lump interval and ratio scales together, referring to either as an
interval/ratio scale.

We must make the distinction among nominal, ordinal, and interval/ratio vari-
ables because statistics that are appropriate for variables measured at one level of
c measurement may not be appropriate for variables measured at a different level.
Nominal Ordinal Interval For example, if Mary is 64 inches tall and John is 70 inches tall, it is reasonable
Ratio = NOIR as in “film noir” to say that their average height is 67 inches. However, suppose we code political
party preference as Democrat = 1, Republican = 2, and Independent = 3 and that
Mary is a Democrat and John is Inde­pendent. If we average Mary’s and John’s

ch02.indd 18 6/7/17 4:02 PM


2.2 Continuous and Discrete Variables 19

party preferences, we get (1 + 3)/2 = 4/2 = 2, which might lead us to the totally
unreasonable conclusion that Mary and John’s average political party preference
is Republican. This unreasonableness arises because the average is an appropriate
statistic for interval/ratio variables such as height but not for nominal variables such
as political preference. We will return to this discussion in Chapter 4.

2.2 Continuous and Discrete Variables


Variables at the ordinal or interval/ratio level of measurement can be classified as either
continuous variable One continuous or discrete (all nominal variables are discrete). A continuous variable is one
with an infinite number of that has an infinite number of possible values between any two adjacent scale values.
values between adjacent
scale values For example, height is a continuous variable because there is no limit to the refinement
we can make in the measurement operation. If we measure more and more precisely,
we can ascer­tain that Mary’s height is 64.4 inches, or 64.37 inches, or 64.372 inches,
or 64.3719 inches.
discrete variable One with A discrete variable is one that has no possible intermediate values between two
no possible intermediate adja­cent points. For example, the number of children a woman has is a discrete vari-
values between two adjacent
points able, with possible values of only whole numbers, not 1.5 or 2.993.
To restate: If, for any two values of a variable, you can imagine a meaningful
interme­diate value, then the variable is continuous. Otherwise, the variable is discrete.

Real Limits
When we say that Mary is 64 inches tall, we do not in general mean that she is exactly
64 inches tall; she could actually be 63.8 inches or 64.372 inches tall. All measure-
ments made on continuous variables are only approximations because of the (theo-
retically) infinite number of possible measured values. When we say that Mary is 64
inches tall, we really mean (if we are to be precise) that Mary is taller than 63.5 inches
and shorter than 64.5 inches. We sometimes say, then, that Mary’s height is 64 ± .5
real limits of a measurement inches. The two values 63.5 and 64.5 are referred to as the real limits of the measured
The points that are half the value of 64 inches. The real limits are the points that are half the measuring unit above
measuring unit above and
below the measured value and below the measured value. For example, if Mary’s time in the 100-yard dash is
recorded as 10.6 seconds, the real limits of that measurement are 10.55 and 10.65 sec-
onds, half the measuring unit (.1 second) below and above the measured value of 10.6
seconds. Equivalently, we can say that Mary’s time is 10.6 ± .05 seconds.

Significant Figures
Statistical computations on continuous variables often result in values that have more
digits than the numbers in the original data. For example, if Mary is 60 inches tall and
both Natasha and Carla are 67 inches tall, then the average height of these three women
is (60 + 67 + 67)/3 = 64.66666666 . . . inches. Two questions should be answered about
such a result: (1) How many of those digits should we report if that average height is
to be the final result of our computations? (2) How many digits should we carry if that
average is a subcomputation whose result will not be reported directly but will be used
in some later computation?

ch02.indd 19 6/7/17 4:02 PM


20 Chapter 2 Variables and Their Measurement

In the behavioral sciences, there is no generally accepted answer to either of these


questions. I (and many others) have these two suggestions:
1. Report in the final answer two more significant figures than were reported in
the original data. Thus, because our height data were originally reported as
whole numbers, our final answer should be reported as a value with two deci-
mal places—that is, as 64.67 inches. If the original data had been reported in
tenths, we would report a final answer with three decimal places, and so on.
2. Carry as intermediate subcomputations at least three more significant figures
than were originally reported. Thus, the average height used as an intermedi-
ate value should be carried forward as 64.667 inches. It is simpler (and often
better) to carry forward as many figures as your calculator will hold and round
only when you report the final answer.

Rounding
Now that we have determined how many digits to report, we must decide how to deter-
mine the value of the final digit. Why, for example, did we write the final value of the
average height as 64.67 and not as 64.66 inches?
rounding The procedure for The procedure for determining the final digit is called rounding and is generally
abbreviating a number accepted among all statisticians. The rule for rounding has three parts:
Rounding rule parts T Part 1 (remainder < 5). If the remainder (the digits beyond the last digit to be
reported) is less than 500 . . . , simply discard the remainder. For example, if we
wish to round 8.347 pounds to one decimal place, the remainder is 47, which is
smaller than 500 . . . , so we discard the remainder and round 8.347 to 8.3 pounds.
T Part 2 (remainder > 5). If the remainder is greater than 500 . . . , add 1 to the
last digit before the remainder and then discard the remainder. For example, if
we wish to round 4.86524 miles to one decimal place, the remainder is 6524,
which is larger than 500 . . . , so we increase the 8 by 1 and discard the remain-
der, thus rounding 4.86524 to 4.9 miles.
T Part 3 (remainder = 5). If the remainder is exactly 500 . . . (that is, exactly
halfway between two candidates for the final digit), then use the even final digit
c and discard the remainder. For example, if we wish to round 7.850 seconds to
one decimal place, the remainder is 50, which is exactly equal to 500. . . . Thus,
Part 3 may not be what
you learned in math class. 7.850 is exactly halfway between 7.8 and 7.9, so we select the even final digit,
If the remainder is 500, 8. Thus 7.850 rounds down to 7.8 seconds (not up to 7.9 seconds as perhaps
statisticians round down half
the time and round up half you learned in math class). Here’s another example of rounding rule part 3: if
the time. we wish to round 2.35 gallons to one decimal place, the remainder is 5, which
is equal to 500. . . . 2.35 is exactly halfway between 2.3 and 2.4, so we round
c to the even final digit, 4. Thus, 2.35 rounds up to 2.4 gallons.
Frequent mistake: rounding Some students find it easy to remember the following expression, which states
3.4501 to 3.4. 3.4501 rounds part 3 of the rule in a different but equivalent way: “Even—leave it. Odd—up.” For
up to 3.5 (part 3 does not
apply because the remainder
example, suppose we wish to round 72.3500 feet to one decimal place. The remain-
is not exactly 500 . . .). der is “500”—that is, exactly 5—so part 3 applies. Because the digit just before the

ch02.indd 20 6/7/17 4:02 PM


2.2 Continuous and Discrete Variables 21

remainder is “3,” which is odd, we follow “Odd—up” and round up to 72.4 feet. On the
other hand, we may wish to round 81.65 grams to one decimal place. The remainder is
“5” and the digit just before the remainder is “6,” which is even, so we “Even—leave
it.” We leave the 6 as it is and round to 81.6 grams.
Part 3 of the rule for rounding is necessary to prevent a systematic bias from enter-
ing our data as a result of rounding. See Box 2.2 for an explanation.

BOX 2.2 How part 3 of the rounding rule prevents bias


Suppose your friend Izzy Buyust tells you that part 3 of the rounding rule is too
complicated: Why not just round all numbers ending in the digit 5 upward? Here’s
how you can show Izzy that his procedure causes a statistical bias in his results. Add
up the 100 numbers 0.1, 0.2, 0.3, . . . , 9.7, 9.8, 9.9, and 10.0. The sum is exactly
505. Now ask Izzy first to round each number using his procedure and then to sum
the rounded values. His sum will be 510, not 505. When you round all numbers
correctly (using part 3 of the rounding rule when necessary), your sum will be 505
as desired.
Why the discrepancy? When you and Izzy round, you both leave 10 numbers
unchanged (1.0 rounds to 1, 2.0 rounds to 2, etc.). Izzy rounds 50 numbers upward
(all those that end in .5, .6, .7, .8, and .9) but only 40 numbers downward (those that
end in .1, .2, .3, and .4), and therefore his procedure biases the sum upward (to 510
instead of 505). You, on the other hand, use part 3 of the rounding rule, so you round
45 values upward (half of those that end in .5 as well as all those that end in .6, .7,
.8, and .9) and 45 values downward (half of those that end in .5 as well as all those
that end in .1, .2, .3, and .4), and your sum comes out to be 505. Thus, part 3 of the
rounding rule lets you avoid Izzy’s bias.

Table 2.2 gives more examples of the rounding process.

c TABLE 2.2 Examples of rounding


Note that the last column Decimal Places Rounding Rule
gives the rounding rule Desired Original Value Remainder Rounded Value Part
1 712.31 1 712.3 1
2 3.697 7 3.70 2
1 5.350 50 5.4 3
0 247.499 499 247 1 (not 3!)
1 84.25 5 84.2 3
1 84.2501 501 84.3 2
2 .005 5 .00 3
0 800.501 501 801 2 (not 3!)

ch02.indd 21 6/7/17 4:02 PM


22 Chapter 2 Variables and Their Measurement

2.3 Summation
Notation
It is often convenient to assign a symbol to take the place of the name of a variable.
Most often we call the variable X; if there are two variables, we call them X and Y; and
generally we call a third variable Z. When it is convenient, we use other symbols; for
example, we might call height H instead of X.
A data set is a collection of measurements on one or more variables. It is often
con­venient to assign an index variable, usually i, that refers to a particular measure-
TABLE 2.3 Heights of five ment’s position in a data set. Table 2.3 shows the measurements on the variable height
people for five people. When we let i = 1, we refer to the first person, John. When we let i = 4,
Height we refer to the fourth person, Carla. The order of data in a data set is usually arbitrary;
Person Index (in.) we could arrange the people in alphabetical order, in which case i = 1 would refer to
John 1 70 Carla. For sample data, we usually refer to the last value of i as n. Thus, n is the number
Mary 2 64 of observations in the sample; n = 5 for the data in Table 2.3.
Natasha 3 65 We often put a subscript on the symbol for the variable when we wish to refer
Carla 4 65
Sam 5 68 to a specific data point rather than the variable in general. Thus, X1 (pronounced
EX∙sub∙WON or just EX∙WON) refers to the first value of X; in Table 2.3 (assuming
Xi The ith value of X we are using X to denote height), X1 = 70 inches, John’s height, and X4 = 65 inches,
Carla’s height. In general, Xi (pronounced EX∙sub∙EYE) refers to the ith value of X.
Many of the computations in statistics involve summing all the entries in a data set.
Xi (EX∙sub∙EYE) To make communication about such sums more compact, statisticians have developed
∑ (SIG∙ma) = “sum of” sum­mation notation. For purposes of this discussion, we will assume that we have
ith (EYEth)
made two measurements, X and Y, on each of six participants, as shown in Table 2.4.
∑ Symbol meaning “the We indicate the sum of all the values of the variable X as ∑Xi, which we read as “the
sum of ” sum of X sub i.”1 The character ∑ (pronounced “SIG∙ma”) is the Greek capital S (for
“sum”; ∑ looks like an E to some students, but it’s really an S). For the data of Table
2.4, we can expand the indicated sum as follows:

Sum of values of a variable ∑ Xi = X1 + X 2 + X 3 + X 4 + X 5 + X 6 (2.1)


= 7+2+4+2+3+6
= 24
TABLE 2.4 Measurements
on variables X and Y In cases where it is unambiguous, we may drop the i subscript, understanding that
i Xi Yi ∑X means ∑Xi. Occasionally, again when the situation is unambiguous, we use three
1 7 5 dots (called “ellipses”) to indicate a sum; for example, we may write Equation (2.1)
2 2 12 as ∑Xi = X1 + X2 + … + X6, where the ellipses (…) stand for the missing X3 + X4 + X5.
3 4 4
4 2 10
5 3 9 Computations
6 6 9
The calculations required in statistics (and throughout this book) often require us to
perform some calculations on the variable before we perform the summation. Suppose,
for the data in Table 2.4, we wish to compute ∑X 2. That expression says, “Sum all the
1Our summation notation ∑ Xi is a simplification of the more complete notation sometimes used by statisti­
cians ∑ ni = 1 Xi , which is read the sum of Xi for all values of i from i = 1 to n.” Because in this book sums
will always be taken over all the values of the variable (that is, in this book sums always run from i = l to n),
we can use the simpler notation without ambiguity.

ch02.indd 22 6/7/17 4:02 PM


2.3 Summation 23

TABLE 2.5 Computing ∑X2 values of X 2.” Note carefully that ∑X 2 does not say “Sum all the values of X and then
X X2 square the sum” (that would be [∑X ]2, which a very different thing). Thus, the first
7 (7)2 = 49
principle of summation is to be careful to observe what is being summed.
2 (2)2 = 4 The computation of ∑X 2 is shown in Table 2.5. Note that we put the original val-
4 16 ues of X in the first column and then create a new column to hold the X 2 values. The
2 4 required sum is then obtained simply by adding down the new column: ∑X 2 = 118.
3 9
6 36 Calculations may be even more complex. For example, ∑3X 2 is obtained by first
∑X 2 = 118 squaring the X values, as shown in the second column of Table 2.6, and then multi-
plying each squared X value by 3, as shown in the third column. Then ∑3X 2 = 354 is
obtained by simply adding down that last column.
TABLE 2.6 Computing ∑3X2 See Box 2.3 for a note about the organization of computations.
X X2 3X 2
7 49 3(49) = 147 BOX 2.3 A note regarding computations
2 4 13(4) = 12
4 16 48 The time you spend solving statistics problems can be divided into two parts: time
2 4 12 spent performing the computations and time spent finding errors that you made in
3 9 27 those computations. Many students tend to overlook the impact that error finding
6 36 108
has on their total study time. There are two ways to minimize the error-finding time:
∑3X 2 = 354
Work carefully and without distraction in the first place, and organize your work so
that errors are easy to locate. Plan on making errors! To err is human! Failing to plan
for errors is also human but not very smart!
Organize your computations so that you are performing only one operation
at a time. For example, in computing ∑3X 2 for the values of X in Table 2.6, you
might be tempted to square the numbers and multiply them in the same step, going
through a mental process such as: “7 squared is 49 times 3 is 147, 2 squared is
4 times 3 is 12, 4 squared is 16 times 3 is 48, . . . .” That seems to be more effi-
cient because it eliminates writing down a column of values (the second column of
Table 2.6); however, alternating back and forth between squaring and multiplying
substantially increases the likelihood of making a mistake. It is much better to per-
form all the squarings first and then do all the multiplications.
You may be saying to yourself, “I can square and multiply in my head; I don’t
need that extra step!” and you may be right most of the time. But such a statement
ignores the fact that if you do make a mistake, the mistake will be much harder to
find when you have combined two computational steps.
I highly recommend that computations be performed in this manner: (1) List the
data in a column, (2) create a new column for every subcomputation, and (3) per-
form a summation only by adding directly down one column.
Thus, to compute the sum ∑3X 2, we (1) list the original data as shown in the first
column of Table 2.6, (2) create a new column with the heading “X 2 ” and perform the
squarings, and another new column with the heading “3X 2” and perform the multipli-
cations, and then (3) obtain the sum by simply adding down that last column.
One operation per column leads to the lowest probability of making a mistake.

Let’s consider another example. Suppose we wish to evaluate the sum ∑(2X – 4)2
∑(2X – 4)2 = “sum of two
X minus four that quantity for these values of X: 7, 2, 4, 2, 3, and 6. We start by listing the data, as shown in the
squared” first col­umn of Table 2.7. Then, to follow the suggestion in Box 2.3 that we perform

ch02.indd 23 6/7/17 4:02 PM


24 Chapter 2 Variables and Their Measurement

TABLE 2.7 Computing only one op­eration at a time, we have to decide which operation to perform first. All
∑(2X − 4)2 calculations must follow the PEMDAS order-of-operations rules of basic algebra:
X 2X 2X–4 (2X–4)2 First perform all operations inside parentheses; then perform exponentiation, then
7 14 10 100 multiplication or division, and only then addition or subtraction. Therefore, we must
2 4 0   0 begin with the expression inside the Parentheses, 2X – 4. In that expression, we must
4 8 4 16 perform the Multiplication first, so we form a new column with the heading “2X” and
2 4 0   0
3 6 2   4
do all the multiplications. Then we can do the Subtractions, creating a new column
6 12 8 64 with the heading “2X – 4.” That finishes the expression inside the parentheses, so now
∑(2X – 4)2 = 184 we perform the Exponentiation outside the parentheses by creating a new column with
the heading “(2X – 4)2.” That is the expression we wish to sum, so we add down that
c column to obtain ∑(2X – 4)2 = 184.
When you have built up a Note that ∑(2X – 4)2 is not equal to [∑(2X – 4)]2; that is, we do not obtain
column heading that matches ∑(2X – 4)2 simply by adding down the third column of Table 2.7 and then squaring that
the required sum, you can
sum. result. Adding down the third column gives ∑(2X – 4) = 24, and squaring that result
gives [∑(2X – 4)]2 = [24]2 = 576, which is not equal to ∑(2X – 4)2 = 184.
TABLE 2.8 Computing ∑XY Occasionally we perform computations with two variables at a time. For exam-
X Y XY ple, ∑XY is obtained by first listing the values of X and Y as shown in Table 2.8. Then
7 5 35 we create a new column headed “XY ” and find ∑XY = 176 by simply adding down
2 12 24 that column.
4 4 16
2 10 20
Note that ∑XY is not equal to ∑X ∑Y. This can be verified by noting that ∑X = 24,
3 9 27 as might be obtained by adding down the first column of Table 2.8, and ∑Y = 49, as
6 9 54 might be obtained by adding down the second column. ∑X ∑Y is thus 24(49) = 1176,
∑XY = 176 which is not equal to ∑XY = 176.
We will use summation notation in all chapters in this textbook, and it is crucial
Personal Trainer that you become confidently skilled in performing such computations.
Click Algebra and then Summation in the Personal Trainer for practice in sum­
Algebra
mation notation following the suggestion in Box 2.3.

2.4 Connections
Cumulative Review
Chapter 1 discussed statistics in general and described the Pygmalion in the classroom
experiment that we will refer to frequently throughout this textbook. It also distinguished
between populations (all the members of the group under consideration) and samples
(some subset of that group), and between parameters (characteristics of populations) and
statistics (characteristics of samples). It also defined the probability of an event as the
number of outcomes favorable to that event divided by the total number of outcomes.

Computers
Many computer programs are available that perform statistical computations—far too
many to be covered in any single book. This textbook provides examples from three
such programs: Personal Trainer’s DataGen, IBM SPSS Statistics Version 24, and
Microsoft Excel (2010 for Windows or 2011 for Macintosh).

ch02.indd 24 6/7/17 4:02 PM


2.4 Connections 25

Personal Trainer’s DataGen, which I wrote, is designed especially for the student
who is seeking to comprehend statistics. It allows easy access to all the computations
typically encountered in introductory statistics so that the student can do as much or
c as little of a computation by hand as is desired. It is remarkably easy to use: Most
students have no trouble getting it to do what they want within the first 5 minutes of
Your instructor may suggest
using either DataGen, SPSS, operation. All statistics are presented auto­matically; there are no “statistics” menus to
or Excel. select or buttons to press. DataGen is thus a program specialized for the acquisition of
statistical comprehension by introductory statistics students. Because of this special-
ization, however, DataGen is by no means a comprehensive program and thus does not
perform many important (but more advanced) statistical analyses. DataGen operates
in the Excel environment; that is, your computer must have Excel (Windows 2007 or
later, Macintosh 2011 or later) installed for DataGen to operate.
IBM SPSS Statistics is a statistical package designed to analyze a wide variety
of statistical data. There are many different versions of SPSS; the one we will use for
examples is Version 24. The instructions for SPSS presented here apply to both the
Windows and the Macintosh versions of SPSS. SPSS is extremely versatile, capa­ble
of performing many statistical tasks that go far beyond the needs of the introductory
statistics student. The price for this versatility is that it is somewhat more difficult to
use than DataGen.
Microsoft Excel is a versatile spreadsheet that can be made to perform many sta-
tistical functions (not as many as SPSS, more than DataGen). It is a widely used data
management tool in industry and universities. Like SPSS, its versatility makes it some-
what more difficult to use than DataGen to explore statistical relationships.
DataGen is thus statistical explorational software; SPSS and Excel are statistical
computational software. Exploration and computation overlap, but DataGen makes the
exploration and comprehension of the basics of statistics transparent. It is much easier
for the student to see what is hap­pening in a particular analysis. When the student
moves beyond introductory statistics, however, DataGen’s limitations become more
apparent—and then SPSS, Excel, or another complete package is the better choice.
However, I have found that students who start with DataGen usually find that the later
transition to SPSS or Excel (or other complete statistics package) is easy.
Both the exploration/comprehension and the tool-acquisition arguments are
persua­sive, so this textbook will provide examples from DataGen, SPSS, and Excel.
Personal Trainer Click ESTAT and then DataGen in the Personal Trainer. Then use DataGen to
DataGen compute ∑X and ∑Y for the data in Table 2.4:
1. In the Personal Trainer, click DataGen (Win) if you are a Windows user or
DataGen (Mac) if you are a Macintosh user. DataGen (Win) requires that
you have Microsoft Office Excel 2007 or later installed on your Windows
computer. DataGen (Mac) requires that you have Microsoft Office Excel 2011
or later installed on your Macintosh computer. If you have a choice, use the
Windows version of DataGen because Macintosh Excel does not allow multi-
ple windows to be open at the same time. That forces Macintosh users to close
one window to look inside another, a substantial inconvenience in DataGen.
2. If Excel presents a message that says “Macros have been disabled” or some-
thing similar, click Enable Content or Enable Macros or whatever similar
option is presented.

ch02.indd 25 6/7/17 4:02 PM


26 Chapter 2 Variables and Their Measurement

3. Enter the data from Table 2.4 into the DataGen spreadsheet. To enter values
going down a column, press Enter after each value. To enter values going
across a row, press Tab after each value. You may also navigate with the arrow
keys or the mouse. You do not need to enter the i values because ESTAT auto-
matically provides them. Enter the six X values into the Variable 1 column;
enter the Y values into Variable 2.
4. If you like, you may change the headings: Click anywhere in the Variable 1
column. Click Rename variable; enter X into the cell provided; click OK.
Also change the name of Variable 2 to Y.
5. Note that the sum of each column appears automatically in the DataGen
Descrip­tive Statistics window on the line labeled “Sum of observations.”
Now use DataGen to compute ∑XY:
6. Highlight the first six cells of the Variable 3 column. Then click Edit variable.
7. Next to the Product button check the boxes marked 1 and 2. Then click Product.
8. Note that the Variable 3 column now holds the product of X and Y. Note
also that the sum of these products (176) now automatically appears in the
DataGen Descriptive Statistics window. (Macintosh users will have to
close the Edit Variable window and then click Show Descriptive Statistics
Window.)
You can vary steps 6 and 7 to find any sum or product of variables.

S Use SPSS to compute ∑X and ∑Y:


a % 1. Start SPSS by clicking Start > All Programs > IBM SPSS > IBM SPSS
SPSS Statistics 24.
2. Click the Type in Data button and then OK.
3. Enter the data from Table 2.4 into the SPSS spreadsheet. To enter values going
down a column, press Enter or Tab after each value. To enter values going
across a row, press after each value. You do not need to enter the i values.
Just enter two columns of data (X and Y).
4. Ask SPSS to show sums by clicking Analyze, then Descriptive Statistics,
and then Frequencies... .
5. Request var00001 to be displayed by clicking . Then click var00002
and click .
6. Click Statistics... , check the Sum box, click Continue, and click OK.
7. Figure 2.1 shows the output, with my annotations, that will appear in the
Outputl–IBM SPSS Statistics Viewer window; the sums are in the table
called “Statistics.” You should see 24.00 as the sum of VAR00001 and 49.00
as the sum of VAR00002.
Now use SPSS to compute ∑XY:
8. Return to the Data Editor by clicking the Go To Data button (looks like a grid
with a big star) on the button bar near the top of the Output1 screen.

ch02.indd 26 6/7/17 4:02 PM


2.4 Connections 27

Check that these are correct

Statistics
VAR00001 VAR00002
N Valid 6 6
Missing 0 0
Sum 24.00 49.00

SX SY

FIGURE 2.1 Sample SPSS output: Descriptive statistics


from Table 2.4

9. Click var in the third column.


10. Click Transform and then click Compute Variable... .
11. Create a new variable label: Type “product” in the “Target Variable” cell.
12. Request the multiplication: Click , click *, click var00002, click , and
then click OK.
13. Note that a new column is created that holds the product of X and Y. The val-
ues should be 35.00 (= 7.00 × 5.00), 24.00, 16.00, 20.00, 27.00, and 54.00
Quite a few steps are involved in asking SPSS to perform this simple task.
Fortunately, the effort expended here will make subsequent tasks easier. You can vary
steps 9 and 10 to ask SPSS to create and evaluate any mathematical expression—an
example of the versatility of SPSS.
H0: m1 = m2 Use Excel to compute ∑X and ∑Y.
X 1. Start Excel by clicking Start > All Programs > Microsoft Office > Microsoft
E xcel Excel 2010. The instructions for Excel 2011 or later are the same.
2. Enter the data from Table 2.4 into the Excel spreadsheet, starting in the first
row of the first column. To enter values going down a column, press Enter or
after each value. To enter values going across a row, press Tab or after
each value. You do not need to enter the i values. Just enter two columns of
data (X and Y).
3. Highlight the six values in column A. Click the Formulas tab; then click
∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum
(24) appears in cell A7.
4. Highlight the six values in column B. Click ∑ AutoSum in the ribbon at the
top of the Excel window. Note that the sum (49) appears in cell B7.
Now use Excel to compute ∑XY.
5. Click on spreadsheet cell C1. Type =A1*B1. Don’t forget to type the equal
sign at the beginning of this entry. Then press Enter. Note that the value
35 (= 7 × 5) appears in cell C1.

ch02.indd 27 6/7/17 4:02 PM


28 Chapter 2 Variables and Their Measurement

6. Click on spreadsheet cell C1. Grab the fill handle ( , the little square
box that appears at the lower right corner of the C1 cell; when you point to it,
the square becomes a cross) and drag it down until the first six cells in column
C are highlighted. Then release the handle. Note that you have filled cells C1
through C6 with the formula that you typed in cell C1. For example, click
cell C2; note in the text entry area just above the spreadsheet that =A2*B2
appears. Excel has applied the formula you typed in cell C1, which you wrote
to apply to the values in the first row, to the values in the second row.
7. Highlight the six values in column C. Click ∑ AutoSum in the ribbon at the
top of the Excel window. Note that the sum (176) appears in cell C7. That is
∑XY.

Homework Tips
1. Check the list of learning objectives at the beginning of this chapter. Do you under-
stand each one?
2. The hint given in Box 2.3 will in the long run save you a lot of computational time,
and I strongly recommend it even if it seems at first like a waste of time. Almost all
students are surprised and frustrated by how many computational errors they make.
This procedure will help you avoid those costly errors.
3. Remember to include at least three extra significant figures in subcomputations.
c 4. Be perfectly clear that [∑(X – 10)]2 is not the same as ∑(X – 10)2. If we follow
This is the most pervasive the advice of Box 2.3, computing [∑(X – 10)]2 requires the creation of one new
computational mistake in
introductory statistics. column headed “X – 10.” By contrast, computing ∑(X – 10)2 requires the creation
of two new columns, the first headed “X – 10” and the second headed “(X – 10)2.”
For practice, click Algebra and then Summation in the Personal Trainer.
Personal Trainer
Click ReviewMaster and then Chapter 2 in the Personal Trainer for an electronic
ReviewMaster interactive review of the concepts in Chapter 2.
Personal Trainer
Click Labs and then Chapter 2 in the Personal Trainer for interactive practice of
Labs the skills in Chapter 2 and a quiz to test your understanding.

ch02.indd 28 6/7/17 4:02 PM


Chapter 2 Exercises 29

CHAPTER 2 E x e rcises
Section A: Basic Exercises (b) Number of bar presses a rat makes
(Answers in Appendix D, page 575) (c) Number of students in your statistics class
(d) Fuel consumption (miles per gallon)
1. (a) Suppose I am interested in the effect of family
size on life satisfaction. I code family size as 0 4. Consider the sample data given in the table.
if four or fewer individuals live in the house-
hold and 1 if more than four individuals live in i X
the household. Does such a coding constitute a 1 6
measurement of family size? If not, why not? 2 1
3 9
(b) Halfway through my study, I find that I am get- 4 –5
ting too many 0 family sizes, so I change my 5 6
coding. I leave my original codings unchanged, 6 0
7 –3
but I now code family size as 0 if three or
fewer individuals live in the household and 1
(a) What value does X4 have?
if more than three live in the household. Does
(b) What is n?
such a procedure constitute measurement? If
(c) Compute ∑X.
not, why not?
(d) Create a column headed “X 2” as described in
Exercise I worked out Box 2.3 and compute ∑X 2.
(a) Measurement is a procedure that assigns a unique (e) Compute (∑X)2.
and unambiguous value to each participant. This (f) For these data, does ∑X 2 = (∑X)2?
coding does that, so the answer is yes. (g) Create a column headed “X + 10” and compute
(b) A measurement operation must be the same for all ∑(X + 10).
participants, but here four-member households are (h) For these data, does ∑(X + 10) = ∑X + 10?
treated inconsistently. Therefore the answer is no. (i) Create a column headed “X 2 – 5” and compute
∑(X 2 – 5).
2. Ray Shoscale wants to know the level of measure- 5. 
For the data of Exercise 4, consider the sum
ment (nominal, ordinal, or interval/ratio) for the  3 X i2 − 6 
following variables. Help Ray out. ∑  4  .
(a) Weights of people (measured in pounds)
(b) The Likert scale (measured as 1 = strongly dis­ (a)  How many columns are required for this
agree; 2 = disagree; 3 = neutral; 4 = agree; and computa­ tion, following the suggestions of
5 = strongly agree) Box 2.3? What are their headings?
(c) Basketball players’ uniform numbers (0, 1, 2, (b) Compute the sum.
. . . , 55) 6. Use the data in the accompanying table.
(d)  Clothing sizes (small, medium, large, extra (a) Find X3 – Y3.
large) (b) Compute ∑XY.
(e) Shades of nail polish (21 = “shameless rose,”
25 = “pink in the afternoon,” 33 = “orchid,” X Y
41 = “berry rich,” 88 = “nouveau nude,” etc.)
3 2
(f) Volumes of bottles (measured in fluid ounces) 4 4
7 1
3. Which of the following variables are discrete and 3 11
which are continuous?
(a) Distance (in miles) from one city to another

ch02.indd 29 6/7/17 4:02 PM


30 Chapter 2 Variables and Their Measurement

7. 
For the data of Exercise 6, consider the sum (h) For these data, does ∑(X – 3) = ∑X – 3?
 2( X − Y )2  (i) Create a column headed “X 2 – 9” and compute
∑  X + Y  . ∑(X 2 – 9).
(a)  How many columns are required for this 12. 
For the data of Exercise 11, consider the sum
computa­ tion, following the suggestions of  ( X − 3)(2 X + 4) 
Box 2.3? What are their headings? ∑  i 2 i  .
(b) Compute the sum.
(a)  How many columns are required for this
8. 
For the data of Exercise 6, demonstrate each computa­ tion, following the suggestions of
expres­sion. Box 2.3? What are their headings?
(a) ∑X 2 ≠ (∑X )2 (b) Compute the sum.
(b) ∑(X + 5) ≠ ∑X + 5
13. Consider the data in the accompanying table.
(c) ∑(X + 5)2 ≠ (∑X + 5)2
(a) Find X 4 – Y4 .
(d) ∑X Y ≠ ∑X ∑Y
(b) Compute ∑X Y.
9. (a) For the data of Exercise 6, how many decimal
places should you use in reporting the value of X Y
(∑X )/3 as a final answer? 1.4   5
(b) What is that value? Be sure to round correctly. 4.1 –2
–3.8   1
10. Round each value to two decimal places. –6.0 –10
(a) 14.637 (f) 3.315 2.2   2
(b) –14.637 (g) 3.325 14. 
For the data of Exercise 13, consider the sum
(c) 2.152 (h) 3.335 (2 X − Y )( X + 2Y ) 
(d) 3.40500 (i) 27.43475 ∑  X +Y
 .
(e) 3.40501
(a) How many columns are required for this
computa­tion, following the suggestions of
Section B: Supplementary Exercises Box 2.3? What are their headings?
11. Use the sample data in the table. (b) Compute the sum.
15. 
For the data of Exercise 13, demonstrate each
i Xi
expres­sion.
1 4
2 –5 (a) ∑X 2 ≠ (∑X )2
3 2 (b) ∑(X + 5) ≠ ∑X + 5
4 –8 (c) ∑(X + 5)2 ≠ (∑X + 5)2
5 21 (d) ∑X Y ≠ ∑X ∑Y
6 3
7 13 16. (a) For the data of Exercise 13, how many decimal
8 6
places should you use in reporting the value of
(a) What value does X2 have? (∑X)/7 as the final answer?
(b) What is n? (b) What is that value? Be sure to round correctly.
(c) Compute ∑X . 17. Round each value to one decimal place.
(d) Create a column headed “X 2” as described in (a) 13.55 (f) 3.14159
Box 2.3 and compute ∑X 2. (b) 13.65000 (g) .02
(e) Compute (∑X )2. (c) –4.1472 (h) 6.66
(f) For these data, does ∑X 2 = (∑X)2? (d) –3.5001 (i) 12.5
(g) Create a column headed “X – 3” and compute (e) –3.6001
∑(X – 3).

ch02.indd 30 6/7/17 4:02 PM


Chapter 2 Exercises 31

Section C: Cumulative Review (e) Greater than or equal to 75


(Answers in Appendix D, page 576) (f) Less than 10
(g) Contains “7” as one (or more) of its digits
18. Suppose you take the numbers 1 through 100 and
(h) Has digits that add to 10
write each one on a poker chip; then you put all the
poker chips in a bag and mix them thoroughly. You 19. (Exercise 18 continued) Suppose I tell you that
draw one chip from the bag. What is the probability the first digit is a “3” and it is a two-digit number.
that you draw a chip with the property given? What is the probability that the chip is “34”?
(a) A “34” 20. (Exercise 18 continued) Suppose I tell you that the
(b) A “129” chip is an even number. What is the probability that
(c) An odd number the chip is “34”?
(d) An even number

Personal Trainer Click Resources and then 2X in the Personal Trainer for additional exercises.
Resources

ch02.indd 31 6/7/17 4:02 PM


ch02.indd 32 6/7/17 4:02 PM

You might also like