STA132 Complete Note
STA132 Complete Note
Course Contents:
Presentation and analysis of data. Curve fitting and goodness of-fit tests. Construction of
questionnaires and simple index numbers. Use of random numbers and statistical tables. 90h(P);
C
SOME DEFINITIONS
Statistics (As a field of study): Statistics is the science of collecting, organizing, summarizing,
analyzing, and making inferences from data.
The subject of statistics is divided into two broad areas which are the Descriptive and Inferential
statistics.
A population consists of all elements that are being studied. For example, we may be interested
in studying the distribution of scores obtained by all the students that offered STA132 during the
2018/2019 academic session.
1
EDWARD CARES
A sample is a subset of the population. For example, we may be interested in studying the
distribution of scores obtained by 100 randomly selected students that offered STA132 during the
2018/2019 academic session.
Since parameters are descriptions of the population, a population can have many parameters.
Similarly, a sample can have many statistics.
In order to obtain information, data are collected from variables used to describe an event. Data
are the values or measurements that variables describing an event can assume.
Data are individual pieces of factual information recorded and used for the purpose of analysis. It
is the raw information from which statistics are created. In other words, statistics are the
characteristic of or a fact about a sample.
Both populations and samples have characteristics that are associated with them. These
are called parameters and statistics, respectively.
A parameter is a characteristic of or a fact about a population. For instance, the average age of
students in Nigerian Universities is µ, say 29 years.
A statistic is a characteristic of or a fact about a sample. For instance the average age of randomly
selected n (e.g. 15) students of University of Ilorin is , say 26 years. Loosely speaking, Statistics
can be regarded as the results of data analysis. We talk of statistics after some computation have
taken place that provide some understanding of what the data means.
Example 1: Suppose is the (random) variable that describes the scores obtained students in
STA132 first CA test. Then, the scores obtained by 10 randomly selected students in this test
represented by , = 1, 2, … , 10 are given as follows:
Realizations: These are specific values of a random variable. For instance, in Example 1 above,
: 20, 4, 21, … are realizations which are all specific values of random variable .
Types of Variables/Data
The manner in which you analyze data depends on the type of data/variables that you are
evaluating. There are several different classifications that are used in classifying data.
Variable
A variable is an item of data that varies from one subject/unit/observation to another.
Examples of variables include quantities such as: gender, investment type, test scores, and
weight.
For instance, if represents the monthly salary of academic staff in Nigeria University, then
is a variable.
Note: Variables whose values are determined by chance are called random variables.
Types/Classifications of Variables
Qualitative: Non-numerical quality
Quantitative: Numerical
Discrete: counts
Continuous: measures
3
EDWARD CARES
Qualitative Variable/Data
Qualitative data are data values that can be placed into distinct categories, according to some
characteristic or attribute.
This variable describes the quality of something in a non-numerical format. Example: Colour
(Red, Black, ….); Gender (Male, Female); Class of Degree (First Class, Second Class upper,
….);
Counts can be applied to qualitative data, but you cannot order (if purely nominal) or measure
this type of variable. Examples are gender, marital status, geographical region of an
organization, job title….
Example 3: Distribution of Colour of cars owned by academic staff in the Faculty of Physical
Sciences, University of Ilorin.
Car Colour Red Blue Yellow Green White
Frequency 20 54 12 56 3
Quantitative Data
Quantitative or numerical data arise when the observations are frequencies or measurements
that are numeric.
Discrete Data
The data are said to be discrete if the measurements are integers (e.g. number of
employees of a company, number of incorrect answers on a test, number of
participants in a program…)
Continuous Data
The data are said to be continuous if the measurements can take on any value,
usually within some range (e.g. weight). Age and income are continuous quantitative
variables. For continuous variables, arithmetic operations such as differences and
averages make sense.
Analysis can take almost any form:
Create groups or categories and generate frequency tables.
Effective graphs include: Histograms, Stem-and-Leaf plots, Dot Plots, Box plots,
and XY Scatter Plots (2 variables).
All descriptive statistics can be applied.
Measurement Scale
Interval: ordered and difference between variables is meaningful (e.g. standardized
scores...)
Ratio: ordered and difference between variables is meaningful, true 0 in measuring
Note: Some “quantitative” variables can be treated only as ranks; they have a natural order, but
these values are not strictly measured (ordinal data). Examples are: 1) age group (taking the
values child, teen, adult, senior), and 2) Likert Scale data (responses such as strongly agree,
5
EDWARD CARES
agree, neutral, disagree, strongly disagree). For these variables, the distinction between adjacent
points on the scale is not necessarily the same, and the ratio of values is not meaningful.
Analyze using:
Frequency tables
Mode, Median, Quartiles
Graphs: Bar Charts, Dot Plots, Pie Charts, and Line Charts (2 variables)
6
EDWARD CARES
PRESENTATION OF DATA
1.1 INTRODUCTION
Once data has been collected, it has to be classified and organised in such
a way that it becomes easily readable and interpretable, that is, converted to
information. Before the calculation of descriptive statistics, it is sometimes a good
idea to present data as tables, charts, diagrams or graphs. Most people find
‘pictures’ much more helpful than ‘numbers’ in the sense that, in their opinion,
they present data more meaningfully.
1.2.1 Arrays
1. Minimum observation
2. Maximum observation
3. Number of observations, n
4. Mode
5. Median, if n is odd
1
EDWARD CARES
Example
2 7 8 11 15
16 18 19 19 19
23 23 24 26 27
29 33 40 44 47
49 51 54 63 68
Table 1.2.1
1. Minimum = 2
2. Maximum = 68
3. Number of observations = 25
4. Mode = 19
5. Median = 24
Example
Table 1.2.2
2
EDWARD CARES
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
Table 1.2.3
A line graph is usually meant for showing the frequencies for various
values of a variable. Successive points are joined by means of line segments so
that a glance at the graph is enough for the reader to understand the distribution of
the variable.
The simplest of line graphs is the single line graph, so called because it
displays information concerning one variable only, in terms of its frequencies.
Example
Table 1.3.1.1
3
EDWARD CARES
160
140
Number of students 120
100
80
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.3.1.2
Table 1.3.2.1
4
EDWARD CARES
160
140
Number of students
120
100
UoM
80 DCDMBS
UTM
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.3.2.2
The pie chart follows the principle that the angle of each of its sectors
should be proportional to the frequency of the class that it represents.
Merits
Limitations
5
EDWARD CARES
Example
Using the same data from Table 1.2.3, but this time, including the total
number of students enrolled for BA, B Com and B Sc, we shall now display the
distribution of students for these three courses the population.
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.4.1.1
BA
33%
B Com
40%
B Sc
27%
Fig. 1.4.1.2
6
EDWARD CARES
This is just an enhancement (as the name says itself) of a simple pie chart
in order to lay emphasis on particular sector.
Example
Again, using the same data from Table 1.2.3, but this time, including the
total number of students enrolled for BA, B Com and B Sc, we shall now display
the distribution of students for these three courses the population.
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.4.1.3
BA
33%
40% B Com
B Sc
27%
Fig. 1.4.1.4
7
EDWARD CARES
The bar chart is one of the most common methods of presenting data in a
visual form. Its main purpose is to display quantities in the form of bars. A bar
chart consists of a set of bars whose heights are proportional to the frequencies
that they represent.
Note that the figure may be drawn horizontally or vertically. There are
different types of bar charts, depending on the number of variables and the type of
information to be displayed.
General merits
General limitations
Note Any additional merit or limitation for each type of bar chart will be
mentioned in its corresponding section.
The simple bar chart is used for the case of one variable only. In Table
1.5.1.1 below, our variable is age.
Example
Table 1.5.1.1
8
EDWARD CARES
160
140
Number of students
120
100
80
60
40
20
0
19 20 21 22 23 24
Age
Fig. 1.5.1.2
The multiple bar chart is an extension of a simple bar chart when there are
quantities of several variables to be displayed. The bars representing the
quantities for the different variables are piled next to one another for each
attribute.
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.5.2.1
9
EDWARD CARES
Multiple bar chart showing the results for BA, B Com and B Sc
40
35
30
25 Pass
Results
20 Supp
15 Fail
10
0
BA B Com B Sc
Courses
Fig. 1.5.2.2
Merits
Limitations
1. The figure becomes very cumbersome when there are too many variables
and components.
2. Only absolute, not relative, values are available – it is much easier to
compare component percentages across variables.
In this type of bar chart, the components (quantities) of each variable are
piled on top of one another.
10
EDWARD CARES
Example
COURSE
BA B Com B Sc
Pass 37 25 33
RESULT
Supp 5 10 4
Fail 11 8 27
TOTAL 53 43 64
Table 1.5.2.1
70
60
50
Fail
Results
40
Supp
30
Pass
20
10
0
BA B Com B Sc
Courses
Fig. 1.5.2.2
Merits
Limitations
11
EDWARD CARES
Fig. 1.5.4 presents the same data as for the previous example.
100%
90%
80%
70%
Fail
Results
60%
50% Supp
40%
Pass
30%
20%
10%
0%
BA B Com B Sc
Courses
Fig. 1.5.4
Merits
Limitations
12
EDWARD CARES
1.6 HISTOGRAMS
The histogram should be clearly distinguished from the bar chart. The
most striking physical difference between these two diagrams is that, unlike the
bar chart, there are no ‘gaps’ between successive rectangles of a histogram. A bar
chart is one-dimensional since only the length, and not the width, matters whereas
a histogram is two-dimensional since both length and width are important.
Example
Consider the set of data in Fig. 1.6.1.1, which represents the ages of
workers of a private company. The real limits and mid-class values have already
been computed.
Table 1.6.1.1
13
EDWARD CARES
40
Number of workers (frequency)
35
30
25
20
15
10
0
[20.5, [25.5, [30.5, [35.5, [40.5, [45.5, [50.5, [55.5,
25.5) 30.5) 35.5) 40.5) 45.5) 50.5) 55.5) 60.5)
Age group of w orkers
Fig. 1.6.1.2
When class intervals are unequal, a correction must be made. This consists
of finding the frequency density for each class, which is the ratio of the frequency
to the class interval. The frequency densities now become the actual heights of
the rectangles since the areas of the rectangles should be proportional to the
frequencies.
Frequency
Frequency density =
Class interval
Example
14
EDWARD CARES
Table 1.6.2.1
Note [20 – 30) means ‘from 20 to 30, including 20 but excluding 30’.
1.6
1.4
1.2
Frequency density
1
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80
Temparature (degrees Fahrenheit)
Fig. 1.6.2.2
15
EDWARD CARES
Example
Temperature Frequency
[0 – 10) 2
[10 – 20) 7
[20 – 30) 11
[30 – 40) 17
[40 – 50) 9
[50 – 60) 3
[60 – 70) 1
Total 50
Table 1.7.1.1
18
16
14
12
Frequency
10
0
-10 0 10 20 30 40 50 60 70 80
Fig. 1.7.1.2
16
EDWARD CARES
The frequency polygon may also be directly drawn by finding the points
on the figure. The x-coordinate of each point is the mid-class value of the cell
whilst the y-coordinate is the frequency of the cell (or frequency density if class
intervals are unequal). Successive points are then linked by means of line
segments.
In that state, the polygon would be ‘hanging in the air’, that is, it would
not touch the x-axis. To satisfy this ultimate requirement, we determine its left
(right) x-intercept by respectively subtracting (adding) the class intervals of the
first (last) classes from the x-intercept of the first (last) point.
Example
Using the data from Table 1.6.2.1, we have the following polygon:
45
Number of students (frequency)
40
35
30
25
20
15
10
5
0
20.5 – 25.5 – 30.5 – 35.5 – 40.5 – 45.5 – 50.5 – 55.5 –
25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Age of students
Fig. 1.7.2
17
EDWARD CARES
1.8 OGIVES
Definition 1
Definition 2
Note For the rest of this course, we will denote ‘cumulative frequency’ by CF.
Example
Table 1.8.1
18
EDWARD CARES
Note Careful inspection of Table 1.8.1 reveals that the ‘less than’ CF of a class
is also the overall rank of the last observation in that class. This is a very
important finding since it will be of tremendous help to us when
calculating percentiles.
The points on a ‘less than’ CF ogive have upper real limits for x-
coordinates and ‘less than’ CF for y-coordinates. This is quite easy to remember:
‘less than’ CFs are defined according to upper real limits!
If we use the data from Table 1.8.1, the following ‘less than’ CF curve is
obtained.
160
140
120
'Less than' CF
100
80
60
40
20
0
20.5 25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Fig. 1.8.2
Note The ‘less than’ CF ogive has an x-intercept equal to the lower real limit of
the first class.
19
EDWARD CARES
The points on a ‘more than’ CF ogive have lower real limits for x-
coordinates and mores than’ CF for y-coordinates. Remember that ‘more than’
CFs are defined according to lower real limits!
Again, if we use the data from Table 1.8.1, the following ‘more than’ CF
curve is obtained.
160
140
120
'More than' CF
100
80
60
40
20
0
20.5 25.5 30.5 35.5 40.5 45.5 50.5 55.5 60.5
Fig. 1.8.3
Note The ‘less than’ CF ogive has an x-intercept equal to the upper real limit of
the last class.
20
EDWARD CARES
Stem and leaf diagrams, or stemplots, are used to represent raw data, that
is, individual observations, without loss of information. The ‘leaves’ in the
diagram are actually the last digits of the values (observations) while the ‘stems’
are the remaining part of the values. For example, the value 117 would be split as
‘11’, the stem, and ‘7’, the leaf. By splitting all the values and distributing them
appropriately, we form a stemplot. The example in Section 1.9.1 would be a better
illustration of the above explanation.
Example
84 17 38 45 47
53 76 54 75 22
66 65 55 54 51
44 39 19 54 72
Table 1.9.1.1
In the first instance, the data is classified in the order that it appears on a
stemplot (see Fig. 1.9.1.2). The leaves are then arranged in ascending order (see
Fig. 1.9.1.3) – this is indeed a very practical way of arranging a set of data in
order if the number of observations is not very large.
21
EDWARD CARES
Example
FRENCH 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
ENGLISH 52 58 68 77 38 85 43 44 56 65
65 79 44 71 84 72 63 69 72 79
Table 1.9.2.1
Fig. 1.9.2.2
From Fig. 1.9.2.2, we can deduce that pupils performed better in English
than in French (since they had higher marks in English given the negative
skewness of the distribution).
Merits
22
EDWARD CARES
1. Minimum value
2. Lower quartile
3. Median
4. Upper quartile
5. Maximum value
Example
Using the data from Table 1.9.1.1 in Section 1.9.1, we have the following
five summary statistics:
Minimum 17
Lower quartile 40.25
Median 53.5
Upper quartile 65.75
Maximum 84
Fig. 1.10.1
O 10 20 30 40 50 60 70 80 90 100
23
EDWARD CARES
Apart from the five descriptive statistics, we can deduce the following
about the distribution:
1. The range – the numerical difference between the maximum and the
minimum values.
2. The inter-quartile range – the difference between the upper and lower
quartiles. It measures the dispersion for the middle 50% of the distribution.
3. The skewness of the distribution – if the median is closer to the lower
(upper) quartile, the distribution is positively (negatively) skewed. If it is
exactly in the middle of those quartiles, the distribution is symmetrical.
Several boxplots may even be plotted on the same axes for comparison
purposes. We might wish to compare marks obtained by students in French and
English so as to study any similarities and differences between their performances
in these subjects.
French
English
O 10 20 30 40 50 60 70 80 90 100
Number of marks
Fig. 1.10.3
24
EDWARD CARES
Just imagine that we wish to know whether the length of a metal rod varies
with temperature. We may choose to record the length of the rod at various
temperatures. It is clear here that ‘temperature’ is the independent variable and
‘length’ is the dependent one. These data are kept in the form of a table in which
‘temperature’ and ‘length’ are labelled as X and Y respectively. We next plot the
corresponding pairs of readings in (x, y) form on a graph, the scatter diagram. Fig.
1.11.2 is an example of a scatterplot.
Example
Temperature (0C) 13 50 63 58 20 78 39 55 29 62
Length (cm) 5.10 5.68 5.85 5.74 5.25 5.98 5.59 5.73 5.46 5.81
Table 1.11.1
6.1
6
5.9
5.8
5.7
Length
5.6
5.5
5.4
5.3
5.2
5.1
5
0 10 20 30 40 50 60 70 80 90
Temperature
Fig. 1.11.2
25
EDWARD CARES
Example
The following data represent the annual sales of petrol in Iraq in millions
of dollars for the period 1985-96.
Year (19_) 85 86 87 88 89 90 91 92 93 94 95 96
Sales ($m) 600 840 420 720 640 860 420 740 670 900 430 760
Table 1.12.1
1000
900
800
700
Sales ($m)
600
500
400
300
200
100
0
85 86 87 88 89 90 91 92 93 94 95 96
Year
Fig. 1.12.2
26
EDWARD CARES
A time series shows the trend, cycle and seasonality in the behaviour of a
variable. It is a very sophisticated means of forecasting the values of the variable
on the assumption that history repeats itself.
Example
Table 1.13.1 below refers to tax paid by people in various income groups
in a sample. Construct a Lorenz curve for the data and comment on it.
Table 1.13.1
The above table now should be altered in such a way that relative
cumulative frequencies may now be displayed for both variables, that is, ‘number
of people’ and ‘tax paid’. We must change the labels for the first column,
determine the cumulative frequencies and then convert these to percentages
(proportions) as shown in Table 1.13.2.
Table 1.13.2
27
EDWARD CARES
0.9
0.8 Line of
uniform
distribution
0.7
Proportion of tax paid
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Proportion of taxpayers
Fig. 1.13.3
The further the curve is from the line of uniform distribution, the more
uneven is the distribution. It can be observed, for example, that approximately
36% of the population of taxpayers pays only 10% of the total tax. This shows a
considerable degree of unevenness in the population. In an ideal situation, 36% of
the population would have paid 36% of the total tax.
28
EDWARD CARES
1.14 Z-CHARTS
The annual moving total is the sum of the values of the variable for the
12-month period up to the end of the month under consideration. A line for the
budget for the year to data may be added to a Z-chart, for comparison with the
cumulative sum of actual values.
Example
The sales figures for a company for 2002 and 2003 are as follows.
Table 1.14.1
Table 1.14.2 will now include the cumulative sales for 2003 and the
annual moving total, that is, the 12-month period will be updated from the period
Jan-Dec 2003 to Feb 2003-Jan 2004, then Mar 2003- Feb 2004 and so on until
Jan-Dec 2004, whilst these total sales will be continuously calculated and
recorded.
Note Z-charts do not have to cover 12 months of a year. They could, for
example, also be drawn for four quarters of a year or seven days of a
week.
29
EDWARD CARES
Table 1.14.2
Fig. 1.14.3
30
EDWARD CARES
Interpretation of Z-charts
1. Monthly totals show the monthly results at a glance with any seasonal
variations.
2. Cumulative totals show the performance to data and can be easily compared
with planned and budgeted performance by superimposing the budget line.
3. Annual moving totals compare the current levels of performance with those
of the previous year. If the line is rising, then this year’s monthly results are
better than the results of the corresponding month last year. The opposite
applies if the line is falling. The annual moving total line indicates the long-
term trend of the variable, whether rising, falling or steady.
Note While the values of the annual moving total and the cumulative values are
plotted on month-end positions, the values for the current monthly figures
are plotted on mid-month positions. This is because monthly figures
represent achievement over a particular month whereas the annual moving
totals and the cumulative values represent achievement up to a particular
month end.
31
EDWARD CARES
LEVELS/SCALES OF MEASUREMENTS
There are four different scales of measurement. The data can be defined as being one of the four
scales. The four types of scales are:
Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale
Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or
“labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric
variables or the numbers that do not have any value.
A nominal scale variable is classified into two or more categories. In this measurement
mechanism, the answer should fall into either of the classes.
It is qualitative. The numbers are used here to identify the objects.
The numbers don’t define the object characteristics. The only permissible aspect of
numbers in the nominal scale is “counting.”
Example:
M- Male = 0
F- Female = 1
EDWARD CARES
Here, the variables are used as tags, and the answer to this question should be either M or F.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data
without establishing the degree of variation between them. Ordinal represents the “order.”
Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also
ranked.
Example:
o Totally disagree
Interval Scale
The interval scale is the 3rd level of measurement scale. It is defined as a quantitative
measurement scale in which the difference between the two variables is meaningful. In other
words, the variables are measured in an exact manner, not as in a relative way in which the
presence of zero is arbitrary.
The interval scale is quantitative as it can quantify the difference between the values
It allows calculating the mean and median of the variables
To understand the difference between the variables, you can subtract the values between
the variables
The interval scale is the preferred scale in Statistics as it helps to assign any numerical
values to arbitrary assessment such as feelings, calendar types, etc.
Example:
Likert Scale
Net Promoter Score (NPS)
Bipolar Matrix Table
Ratio Scale
The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable
measurement scale. It allows researchers to compare the differences or intervals. The ratio scale
has a unique feature. It possesses the character of the origin or zero points.
It affords unique opportunities for statistical analysis. The variables can be orderly added,
subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio
scale.
Ratio scale has unique and useful properties. One such feature is that it allows unit
conversions like kilogram – calories, gram – calories, etc.
Example:
In this section, we consider experiments with multiple outcomes. The probability of each
outcome is fixed.
Definition: A chi-square goodness-of-fit test is used to test whether a frequency distri-
bution obtained experimentally fits an “expected” frequency distribution that is based on
the theoretical or previously known probability of each outcome.
An experiment is conducted in which a simple random sample is taken from a population,
and each member of the population is grouped into exactly one of k categories.
Step 1: The observed frequencies are calculated for the sample.
Step 2: The expected frequencies are obtained from previous knowledge (or belief) or
probability theory. In order to proceed to the next step, it is necessary that each expected
frequency is at least 5.
Step 3: A hypothesis test is performed:
(i) The null hypothesis H0 : the population frequencies are equal to the expected frequen-
cies.
(ii) The alternative hypothesis, Ha : the null hypothesis is false (what does this imply about
the population frequencies?).
Example 1: Researchers have conducted a survey of 1600 coffee drinkers asking how much
coffee they drink in order to confirm previous studies. Previous studies have indicated that
72% of Americans drink coffee. The results of previous studies (left) and the survey (right)
are below. At α = 0.05, is there enough evidence to conclude that the distributions are the
same?
1
EDWARD CARES
% of Coffee
Response
Drinkers Response Frequency
2 cups per week 15% 2 cups per week 206
1 cup per week 13% 1 cup per week 193
1 cup per day 27% 1 cup per day 462
2+ cups per day 45% 2+ cups per day 739
(i) The null hypothesis H0 :the population frequencies are equal to the expected frequencies
(to be calculated below).
(iii) α = 0.05.
% of Coffee (O−E)2
Response E O O−E (O − E)2 E
Drinkers
2 cups per week 15% 0.15 × 1600 = 240 206 −34 1156 4.817
1 cup per week 13% 0.13 × 1600 = 208 193 −15 225 1.082
1 cup per day 27% 0.27 × 1600 = 432 462 30 900 2.083
2+ cups per day 45% 0.45 × 1600 = 720 739 19 361 0.5014
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 8.483 > 7.815, there is enough
statistical evidence to reject the null hypothesis and to believe that the old percentages
no longer hold.
Example 2: A department store, A, has four competitors: B,C,D, and E. Store A hires a
consultant to determine if the percentage of shoppers who prefer each of the five stores
is the same. A survey of 1100 randomly selected shoppers is conducted, and the results
about which one of the stores shoppers prefer are below. Is there enough evidence using a
significance level α = 0.05 to conclude that the proportions are really the same?
Store A B C D E
Number of Shoppers 262 234 204 190 210
2
EDWARD CARES
(i) The null hypothesis H0 :the population frequencies are equal to the expected frequencies
(to be calculated below).
(iii) α = 0.05.
% of (O−E)2
Preference E O O−E (O − E)2 E
Shoppers
A 20% 0.2 × 1100 = 220 262 42 1764 8.018
B 20% 0.2 × 1100 = 220 234 14 196 0.891
C 20% 0.2 × 1100 = 220 204 −16 256 1.163
D 20% 0.2 × 1100 = 220 190 −30 900 4.091
E 20% 0.2 × 1100 = 220 210 −10 100 0.455
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 14.618 > 9.488, there is enough
statistical evidence to reject the null hypothesis and to believe that customers do not
prefer each of the five stores equally.
3
EDWARD CARES
10.2 Independence
Recall that two events are independent if the occurrence of one of the events has no effect
on the occurrence of the other event.
A chi-square independence test is used to test whether or not two variables are inde-
pendent.
As in section 10.1, an experiment is conducted in which the frequencies for two variables
are determined. To use the test, the same assumptions must be satisfied: the observed
frequencies are obtained through a simple random sample, and each expected frequency is
at least 5. The frequencies are written down in a table: the columns contain outcomes for
one variable, and the rows contain outcomes for the other variable.
The procedure for the hypothesis test is essentially the same. The differences are that:
(ii) Ha is that the two variables are not independent (they are dependent).
(iii) The expected frequency Er,c for the entry in row r, column c is calculated using:
Example 3: The results of a random sample of children with pain from musculoskeletal
injuries treated with acetaminophen, ibuprofen, or codeine are shown in the table. At α =
0.10, is there enough evidence to conclude that the treatment and result are independent?
(i) The null hypothesis H0 : the treatment and response are independent.
4
EDWARD CARES
(ii) The alternative hypothesis, Ha : the treatment and response are dependent.
(iii) α = 0.10.
(O−E)2
Row, Column E O O−E (O − E)2 E
200·100
1,1 300
= 66.7 58 −8.7 75.69 1.135
200·100
1,2 300
= 66.7 81 14.3 204.49 3.067
200·100
1,3 300
= 66.7 61 −5.7 32.49 0.487
100·100
2,1 300
= 33.3 42 8.7 75.69 2.271
100·100
2,2 300
= 33.3 19 −14.3 204.49 6.135
100·100
2,3 300
= 33.3 39 5.7 32.49 0.975
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 14.07 > 4.605, there is enough sta-
tistical evidence to reject the null hypothesis and to believe that there is a relationship
between the treatment and response.
Practice Problem 1: A doctor believes that the proportions of births in this country on each
day of the week are equal. A simple random sample of 700 births from a recent year is
selected, and the results are below. At a significance level of 0.01, is there enough evidence
to support the doctor’s claim?
(i) The null hypothesis H0 :the population frequencies are equal to the expected frequencies
(to be calculated below).
(iii) α = 0.01.
5
EDWARD CARES
(O−E)2
Day E O O−E (O − E)2 E
Sunday 700/7 = 100 65 −35 1225 12.25
Monday 700/7 = 100 103 3 9 0.09
Tuesday 700/7 = 100 114 14 196 1.96
Wednesday 700/7 = 100 116 16 256 2.56
Thursday 700/7 = 100 115 15 225 2.25
Friday 700/7 = 100 112 12 144 1.44
Saturday 700/7 = 100 75 −25 625 6.25
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 26.8 > 16.812, there is enough
statistical evidence to reject the null hypothesis and to believe that the proportion of
births is not the same for each day of the week.
Practice Problem 2: The side effects of a new drug are being tested against a placebo. A
simple random sample of 565 patients yields the results below. At a significance level of
α = 0.05, is there enough evidence to conclude that the treatment is independent of the side
effect of nausea?
(i) The null hypothesis H0 : the treatment and response are independent.
(ii) The alternative hypothesis, Ha : the treatment and response are dependent.
(iii) α = 0.01.
(O−E)2
Row, Column E O O − E (O − E)2 E
49·290
1,1 565
= 25.15 36 10.85 117.72 4.681
49·275
1,2 565
= 23.85 13 −10.85 117.72 4.936
516·290
2,1 565
= 264.85 254 −10.85 117.72 0.444
516·275
2,2 565
= 251.15 262 10.85 117.72 0.469
6
EDWARD CARES
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 10.53 > 2.706, there is enough sta-
tistical evidence to reject the null hypothesis and to believe that there is a relationship
between the treatment and response.
Practice Problem 3: Suppose that we have a 6-sided die. We assume that the die is unbiased
(upon rolling the die, each outcome is equally likely). An experiment is conducted in which
the die is rolled 240 times. The outcomes are in the table below. At a significance level of
α = 0.05, is there enough evidence to support the hypothesis that the die is unbiased?
Outcome 1 2 3 4 5 6
Frequency 34 44 30 46 51 35
(i) The null hypothesis H0 : each face is equally likely to be the outcome of a single roll.
(iii) α = 0.05.
(O−E)2
Face E O O−E (O − E)2 E
1 240/6 = 40 34 −6 36 0.9
2 240/6 = 40 44 4 16 0.4
3 240/6 = 40 30 −10 100 2.5
4 240/6 = 40 46 6 36 0.9
5 240/6 = 40 51 11 121 3.025
6 240/6 = 40 35 −5 25 0.625
(vii) Is there enough evidence to reject H0 ? Since χ2 ≈ 8.35 < 15.086, we fail to reject the
null hypothesis, that the die is fair.
7
EDWARD CARES
CGN 3421 - Computer Methods Gurley
2) Curve fitting - capturing the trend in the data by assigning a single function across the entire range.
The example below uses a straight line function
f(x) = ax + b
f(x) = ax + b
for each line
for entire range
The goal is to identify the coefficients ‘a’ and ‘b’ such that f(x) ‘fits’ the data well
height of Oxygen in
dropped soil
object
time temperature
pore Profit
pressure
How can we pick the coefficients that best fits the line to the data?
Why does the blue line appear to us to fit the trend better?
• Consider the distance between the data and points on the line
• Add up the length of all the red and blue verticle lines
• The one line that provides a minimum error is then the ‘best’
straight line
$ $ $
%&& " ∑ ( '( ) " ( )% & ! ( "% ) ) # ( )$ & ! ( "$ ) )
$ $
''''''''''''''''''''''''''' # ( ) ( & ! ( " ( ) ) # ( ) ) & ! ( " ) ) )
*'+,-,'./01-2 *'+,-,'./01-2
$ $
%&& " ∑ ( ) ( & ! ( "( ) ) " ∑ ( ) ( & ( #" ( # $ ) )
("% ("%
The ‘best’ line has minimum error between line and data points
This is called the least squares approach, since we minimize the square of the error.
*'+,-,'./01-2'"'*
$
minimize %&& " ∑ ( ) ( & ( #" ( # $ ) )
("%
*
∂%&&
----------- " & $
∂# ∑ " ( ( ) ( & #" ( & $ ) " !
("%
*
∂%&&
----------- " & $
∂$ ∑ ( ) ( & #" ( & $ ) " !
("%
Solve for the # and $ so that the previous two equations both = 0
re-write these two equations
$
# ∑ "( # $ ∑ "( " ∑ ( "( )( )
# ∑ " ( # $3* " ∑ )(
put these into matrix form
* ∑ "(
$ " ∑ )(
$ #
∑ "( ∑ "( ∑ ( "( )( )
what’s unknown?
we have the data points ( " , ) ) for ( " %, 4445'* , so we have all the summation terms in the matrix
( (
* ∑ "( ∑ )(
+ " , , " $ , - "
$
∑ "( ∑ "( # ∑ ( "( )( )
so
+, " -
using built in Mathcad matrix inversion, the coefficients # and $ are solved
>> X = A-1*B
i 1 2 3 4 5 6
6 748 $ " $!498:( , $ " (*. 6 748 3 $!498:( , $ " & !4:78
748 %(478 # )%4689) # 748 %(478 )%4689) # (486%
Profit
We started the linear curve fit by choosing a generic form of the straight line f(x) = ax + b
This is just one kind of function. There are an infinite number of generic forms we could choose from for
almost any shape we want. Let’s start with a simple extension to the linear regression concept
recall the examples of sampled data
height of Oxygen in
dropped soil
object
time temperature
pore Profit
pressure
How can we pick the coefficients that best fits the curve to the
data? We can use the same idea:
The curve that gives minimum error between data ) and the fit
! ( " ) is ‘best’
The general expression for any error using the least squares approach is
$ $ $ $ $ (2)
%&& " ∑ ( '( ) " ( )% & ! ( "% ) ) # ( )$ & ! ( "$ ) ) # ( )( & ! ( "( ) ) # ( )) & ! ( ") ) )
where we want to minimize this error. Now substitute the form of our eq. (1)
/
$ ( / 0
! ( " ) " # ! # # % " # # $ " # # ( " # 444 # # / " " # ! # ∑ #0 "
0"%
into the general least squares error eq. (2)
* $
) & # # # " # # " $ # # " ( # 444 # # " /
%&& " ∑ ( ! % ( $ ( ( ( / ( (3)
("%
where: * - # of data points given, ( - the current data point being summed, / - the polynomial order
re-writing eq. (3)
* / $
0
%&& " ∑ )( & #! #
∑ #0 "
(4)
("% 0"%
find the best line = minimize the error (squared distance) between line and data points
Find the set of coefficients # 0, # ! so we can minimize eq. (4)
CALCULUS TIME
To minimize eq. (4), take the derivative with respect to each coefficient # , # ''0 " %, 444, / set each to
! 0
zero
* /
∂%&& 0
----------- " & $
∂# ! ∑ )( & #! #
∑ #0 " " !
("% 0"%
* /
∂%&& 0
----------- " & $
∂# % ∑ )( & #! #
∑ #0 " " " !
("% 0"%
* /
∂%&& 0 $
----------- " & $
∂# $ ∑ )( & #! #
∑ #0 " " " !
("% 0"%
;
;
* /
∂%&& 0 /
----------- " & $
∂# / ∑ )( & #! #
∑ #0 " " " !
("% 0"%
* ∑ "( ∑ "(
$
444 ∑ "(
/ ∑ )(
#!
$ ( /#% ∑ ( "( )( )
∑ "( ∑ "( ∑ "( 444 ∑ "( #%
$
$ ( ) /#$ # " ∑ "( ) (
∑ "( ∑ "( ∑ "( 444 ∑ " ( $
; ;
; ; ; ;
#/ /
∑ "(
/
∑ "(
/#%
∑ "(
/#$
444
/#/
∑ "( ∑ "( ) (
what’s unknown?
* ∑ "( ∑ "(
$
444 ∑ "(
/ ∑ )(
#!
$ ( /#% ∑ ( "( )( )
∑ "( ∑ "( ∑ "( 444 ∑ "( #%
$
+ " $ ( ) /#$, , " #$ , - " ∑ "( )(
∑ "( ∑ "( ∑ "( 444 ∑ " (
; ;
; ; ; ;
#/ /
∑ "(
/
∑ "(
/#%
∑ "(
/#$
444 ∑ "(
/#/
∑ "( )(
where all summations above are over ( " %, 444, * data points
Note: No matter what the order / , we always get equations LINEAR with respect to the coefficients.
This means we can use the following solution method
+, " -
using built in Mathcad matrix inversion, the coefficients # and $ are solved
>> X = A-1*B
Example #1:
i 1 2 3 4 5 6
$
* ∑ "( ∑ "( #! ∑ )(
∑ "( ∑ "( ∑ "(
$ ( # "
% ∑ "( )(
$
$ ( ) #$ ∑ "( )(
∑ "( ∑ "( ∑ (
"
Now plug in the given data.
Before we go on...what answers do you expect for the coefficients after looking at the data?
* " 6
∑ "( " 748 , ∑ )( " %(478
$
∑ "( " %(478 , ∑ "( )( " $94%$8
( $
∑ "( " $94%$8 ∑ "( )( " 6%4%978
)
∑ "( " 6%4%978
#!
6 748 %(478 %(478
748 %(478 $94%$8 # % " $94%$8
%(478 $94%$8 6%4%978 # 6%4%978
$
$ $
Note: we are using ∑ "( , NOT ( ∑ " ( ) . There’s a big difference
#!
6 748 %(478 %(478
using the inversion method # " (*. 748 %(478 $94%$8 3 $94%$8
%
#$ %(478 $94%$8 6%4%978 6%4%987
#!
! $
#% " ! ===> ! ( " ) " ! # !3" # %3"
#$ %
This fits the data exactly. That is, f(x) = y since y = x^2
x = [0 .0 1 1.5 2 2.5]
y = [0.0674 -0.9156 1.6253 3.0377 3.3535 7.9409]
The resulting system to solve is:
#!
6 748 %(478 %84%!:(
# % " (*. 748 %(478 $94%$8 3 ($4$9()
#$ %(478 $94%$8 6%4%978 7%4$76
#!
& !4%9%$
giving: #% " & !4($$%
#$ %4(8(7
$
! ( " ) " & !4%9%$ & !4($$%"3 # %4(8(73"
Overfit - over-doing the requirement for the fit to ‘match’ the data trend (order too high)
Polynomials become more ‘squiggly’ as their order increases. A ‘squiggly’ appearance comes from
inflections in function
Consideration #1:
Consideration #2:
General rule: pick a polynomial form at least several orders lower than the number of data points.
Start with linear and add order until trends are matched.
Underfit - If the order is too low to capture obvious trends in the data
Profit
General rule: View data first, then select an order that reflects inflections, etc.
The next line references a separate worksheet with a function inside called
Create_Vector. I can use the function here as long as I reference the worksheet first
Reference:C:\Mine\Mathcad\Tutorials\MyFunctions.mcd
f2 := regress ( X , Y , 2) f3 := regress ( X , Y , 3)
300
200
100
2 0 2 4
data
2nd order
3rd order
Note that neither 2nd nor 3rd order fit really describes the data well, but higher order will only get more
‘squiggly’
We created this sample of data using an exponential function. Why not create a general form of the expo-
nential function, and use the error minimization concept to identify its coefficients. That is, let’s replace
/
$ ( / 0
the polynomial equation ! ( " ) " # # # " # # " # # " # 444 # # " " # #
! % $ ( / ! ∑ #0 "
0"%
+"
With a general exponential equation ! ( " ) " 1% " 1 <=. ( +" )
where we will seek C and A such that this equation fits the data as best it can.
Again with the error: solve for the coefficients 1, + such that the error is minimized:
*
$
minimize %&& " ∑ ( ) ( & ( 1 <=. ( +" ) ) )
("%
Problem: When we take partial derivatives with respect to %&& and set to zero, we get two NONLIN-
EAR equations with respect to 1, +
Now what?
Solution #1: Nonlinear equation solving methods
Remember we used Newton Raphson to solve a single nonlinear equation? (root finding)
We can use Newton Raphson to solve a system of nonlinear equations.
Is there another way? For the exponential form, yes there is
+"
1) Take logarithm of both sides to get rid of the exponential >1 ( ) ) " >1 ( 1% ) " +" # >1 ( 1 )
2) Introduce the following change of variables: 2 " >1 ( ) ) , , " ", - " >1 ( 1 )
The original data points in the " & ) plane get mapped into the , & 2 plane.
This is called data linearization. The data is transformed as: ( ", ) ) ⇒ ( ,, 2 ) " ( ", >1 ( ) ) )
Now we use the method for solving a first order linear curve fit
* ∑,
- " ∑2
$
∑, ∑, + ∑ ,2
for + and - , where above 2 " >1 ( ) ) , and , " "
-
Finally, we operate on - " >1 ( 1 ) to solve 1 " %
+"
And we now have the coefficients for ) " 1%
Example: repeat previous example, add exponential fit
f2 := regress ( X , Y , 2 ) f3 := regress ( X , Y , 3 )
(
C := exp coeff
1 ) A := coeff
2 fitexp ( x) := C ⋅ exp ( A ⋅ x) i := − 2 , − 1.9 .. 4
300
200
A = 1.3
C = 1.6
100
2 0 2 4
data
2nd order
exp
Chapter 8
An Introduction to
Questionnaire Design
Introduction
In this chapter you will learn about:
• The key principles of designing effective questionnaires.
• How to formulate meaningful questions.
• The use of structured, semi-structured and unstructured
questionnaires in different types of research design.
• The three most important types of questions for asking
about behaviour, attitudes or classifying respondents
• Key terms used in questionnaire design
• The link between the interviewer, the respondent and the
questionnaire.
129
EDWARD CARES
Step 2 – Make a rough listing of the questions
A list is now made of all the questions that could go into the ques-
tionnaire. The aim at this stage is to be as comprehensive as possi-
ble in the listing and not to worry about the phrasing of the
questions. That comes next.
130
EDWARD CARES
interviews are to be used; self completed if it will be a self comple-
tion questionnaire). Time and money can preclude a proper pilot so
at the very least it should be tested on one or two colleagues for
sense, flow and clarity of instructions. The whole purpose of the test
is to find out if changes are needed so that final revisions can be
made. When carrying out the pilot it is best to run through the
questionnaire with the guinea pig respondent and then go back
over the questions and ask for each one, “what was going through
your mind when you were asked this question?”.
Questionnaire design is one of the hardest and yet one of the most
important parts of the market research process. Given the same
objectives, two researchers would probably never design the same
questionnaire.
131
EDWARD CARES
know if they will receive anything in return for giving their opin-
ion.
Data-processor – the data processor wants a questionnaire which
will result in data which can be processed efficiently and with min-
imum error.
If questionnaires fail it is usually because they are dashed off with
insufficient thought. Questions may be missed out; they could be
badly constructed, too long, or too complicated and sometimes
unintelligible. Good questionnaires are iterations which begin as a
rough draft and, through constant refinement, are converted to pre-
cise and formatted documents. It is not unusual for a questionnaire
to develop through to version 7 or 8.
There are normally five sections in a questionnaire:
• The respondent’s identification data – such as their name,
address, date of the interview, name of the interviewer. The
questionnaire would also have a unique number for purposes
of entering the data into the computer.
• An introduction – this is the interviewer’s request for help.
It is normally scripted and lays out the credentials of the
market research company, the purpose of the study and any
aspects of confidentiality.
• Instructions – the interviewer and the respondent need to
know how to move through the questionnaire such as which
questions to skip and where to move to if certain answers are
given.
• Information – this is the main body of the document and is
made up of the many questions and response codes.
• Classification data – these questions, sometimes at the front
of the questionnaire, sometimes at the end, establish the
important characteristics of the respondent, particularly
related to their demographics.
Ten things to think about when designing a questionnaire:
132
EDWARD CARES
and it will generate a rough topic list which will eventually
be converted into more explicit questions.
2. Think about how the interview will be carried out: the
way that the interview will be carried out will have a bearing
on the framing of the questions. For example, interviews
carried out over the telephone have some limitations
compared with face to face interviews. Self-completion
questionnaires need to be very precise and explicit in the
way they are designed.
3. Think about the introduction to the questionnaire:
scripted introductions can sound “wooden”. However, each
interviewer should say the same thing so there has to be a
standard introduction. It should quickly and succinctly
communicate the purpose of the survey, any aspects of
confidentiality and what is required of the respondent. The
introduction is arguably one of the most important
components of a questionnaire because if it fails to engage
with the respondent, there will be no interview at all.
4. Think about the formatting: the questionnaire should be
clear and easy to read. It should be easy for the interviewer
to navigate around. Questions and response options should
be laid out in a standard format and if the questionnaire is
to be administered on a doorstep in winter, the typeface
should be large enough to read. Where appropriate, there
should be ample space to write in open ended comments.
There should be somewhere (front or back) to write down
the details of the respondent, the date of the interview and
the name of the interviewer.
5. Think about questions from the respondents’ point of
view: questions should be framed in a respondent friendly
manner. Researchers usually know what they want from a
survey but this seldom converts into a straight question. The
question usually has to be broken down into two or three
parts to make it relevant from the respondent’s point of
view. Furthermore, researchers can be greedy for information
and design questionnaires that are too long and impose
impossible tasks for the respondent.
6. Think about the possible answers at the same time as
thinking about the questions: the whole purpose of a
question is to derive answers and so it is essential that some
thought is given to all the possible replies that could be
133
EDWARD CARES
received. It is the anticipation of the complete range of
possible answers that throws up the faults in the question.
For example, it is no good asking people how many loaves of
bread they buy in a year if they think in terms of loaves
purchased per week
7. Think about the order of the questions: the questions
should flow easily from one to another and be grouped into
topics in a logical sequence.
8. Think about the types of questions: texture in the
interview can be achieved by incorporating different styles of
questions. The researcher can choose from open ended
questions, closed questions and scales.
9. Think about how the data will be processed: the
questionnaire is simply the vehicle by which data is
collected from many individuals before being stirred in the
analysis pot. Consideration of how the data will be analysed
at the time of designing the questionnaire will make things
easier later on.
10. Think about interviewer instructions: questionnaires are
administered by interviewers who, skilled as they are,
need clear guidance what to do at every stage of the
interview. These instructions need to be differentiated
from the text either by capital letters, emboldened or
underlined type.
134
EDWARD CARES
• Make the questions as simple as possible. Questions should not
only be short, they should also be simple. Those which
include multiple ideas or two questions in one will confuse
and be misunderstood.
• Make the questions very specific. Notwithstanding the
importance of brevity and simplicity, there are occasions
when it is advisable to lengthen the question by adding
memory cues. For example, it is good practice to be specific
with time periods.
• Avoid jargon or shorthand. It cannot be assumed that
respondents will understand words commonly used by
researchers. Trade jargon, acronyms and initials should be
avoided unless they are in every day use.
• Steer clear of sophisticated or uncommon words. A questionnaire
is not a place to score literary points so only use words in
common parlance. Colloquialisms are acceptable if they will
be understood by everybody (some are highly regional).
• Avoid ambiguous words. Words such as `usually’ or
`frequently’ have no specific meaning and need qualifying.
• Avoid questions with a negative in them. Questions are more
difficult to understand if they are asked in a negative sense.
It is better to say “Do you ever ...?”, as opposed to “Do you
never ...?
• Avoid hypothetical questions. It is difficult to answer questions
on imaginary situations. Answers may be given but they
cannot necessarily be trusted.
• Do not use words which could be misheard. This is especially
important when the interview is administered over the
telephone. For example, fifteen and fifty can sound very
similar.
• Desensitise questions by using response bands. Questions which
ask women about their age or companies about their
turnover are best presented as a range of response bands.
This softens the question by indicating that precision isn’t
necessary and a broad answer is acceptable. The data will
almost certainly be grouped into bands at the analysis stage,
so it may as well be collected in this way.
• Ensure that fixed responses do not overlap. The categories which
are used in fixed response questions (such as the age bands
135
EDWARD CARES
of respondents, the turnover bands of companies etc) should
be sequential and not overlap otherwise some answers will
be caught on the cusp.
• Allow for `others’ in fixed response questions. Pre-coded answers
should always allow for a response other than those listed.
Think about
How many questionnaires pass in front of you that you put
straight in the bin? Start collecting them. In time you will have
a good variety from which you can pick and choose questions
and layouts when you have to design a questionnaire.
136
EDWARD CARES
phone, face-to-face or self completion depending on the respondent
type, the content of questionnaire and the budget.
Semi-structured questionnaires comprise a mixture of closed and open
questions. They are commonly used in business-to-business market
research where there is a need to accommodate a large range of dif-
ferent responses from companies. The use of semi-structured ques-
tionnaires enables a mix of qualitative and quantitative information
to be gathered. They can be administered over the telephone or
face-to-face.
Unstructured questionnaires are made up of questions that elicit free
responses. These are guided conversations rather than structured
interviews and would often be referred to as a “topic guide”. The
topic guide is made up of a list of questions with an apparent order
but is not so rigid that the interviewer has to slavishly follow it in
every detail. The interviewer can probe or even construct new ques-
tions which have not been scripted. This type of questionnaire is
used in qualitative research for depth interviewing (face-to-face,
depth telephone interviews) and they form the basis of many stud-
ies into technical or narrow markets.
Using one of these types of questionnaire, (structured, semi-struc-
tured, or unstructured) a check should be made on how meaningful
it is, by asking “Is it measuring or probing what they think it’s measur-
ing or probing?”. If you get this right respondents will be able to give
valid answers.
Another simple measure is to think through all the possible
responses. This will make sure that the responses that are obtained
are reliable. Basically this means that the answers received should be
the same as those that would be given, if you repeated the question.
There are two major issues that can have a bad effect on both the
quality of your data, and a respondent’s attitude towards market
research. These are using excessively long questionnaires, and repet-
itive questioning techniques. Variety is the spice of questionnaires,
as well as of life! Use lots of different question types to stop respon-
dents getting bored. Stimulus materials, such as show cards and
advertisements, can also help provide texture in the interview.
137
EDWARD CARES
poses The three different types of information that can be gathered
and the surveys in which they are used is summarised in Figure 8.2.
Behavioural questions
Behavioural questions are designed to find out what people (or
companies) do. For example, do people eat butter or margarine?.
How much do they eat? What brands do they buy? Who buys it?
etc. They determine people’s actions in terms of what they have
bought, used, visited, seen, read or heard. Behavioural questions
record facts and not matters of opinion.
138
EDWARD CARES
• When did you last ........?
• Which do you do most often ........?
• Who does it ........?
• How many ........?
• Do you have ........?
• In what way do you do it ........?
• In the future will you ........?
Attitudinal questions
People hold opinions or beliefs on everything from the products
they buy and the companies which make or supply them through
to social issues and politics. These attitudes are important because
they influence the way people act.
Researchers explore attitudes using questions which especially begin
with the word `why...’. Also useful are the questions How?, Which,
Who?, Where?, What? In attitudinal and motivational research, the
phrases are often used: “Why did you say that?” or “Would you
explain?”.
139
EDWARD CARES
Very likely
Quite likely
Neither likely or not likely
Not very likely
Not likely at all.
2. Numerical rating scales. This is a very similar approach to
the verbal rating except the respondent is asked to give a
numerical `score’ rather than a semantic response. The scores
are often out of a number with 5, 7 and 10 being popular
choices (where the large number is best and 1 is worst). It
should be borne in mind that the bigger the scale, the more
consideration is required from the respondent.
3. The use of adjectives. An alternative to a scale is to ask
respondents which words best describe a company, a product
or a brand. The adjectives could be both positive and
negative and they need not be opposites. This could easily be
converted into a scale, for example, asking people which of
two adjectives they associate with a product or brand –
reliable v unreliable. In a self completion questionnaire a
line or scale could separate the two words and the
respondent is asked to mark the line to indicate their view.
4. The use of positioning statements. Here the respondent is
asked to agree or disagree with a number of statements. It is
important that the respondent is readily able to identify with
one of the statements and not left feeling that somehow
they do not capture their mood. Positioning statements are a
variation of the verbal rating scale and are often known as
agree/disagree scales or Likert scales after the person who
popularised them. Typically a statement is read out and the
respondent is presented with five choices such as:
Agree strongly
Agree slightly
Neither agree nor disagree
Disagree slightly
Disagree strongly
5. Ranking questions. Researchers often need to find out what is
the order of importance of various factors from a list. Typically
this is achieved by presenting the list and asking which is
most important, which is second most important and so on.
140
EDWARD CARES
Think about
The questions we ask are who, what, when, where, why, and
how. Which of these do you think is the most difficult for peo-
ple to answer? Why is it the most difficult?
Classification questions
The third group of questions are those used to classify the informa-
tion once it has been collected. Classification questions check that
the correct quota of people or companies has been interviewed and
are used to make comparisons between different groups of respon-
dents. Most classification questions are behavioural (factual).
A number of standard classification questions crop up constantly in
market research surveys. These are:
• Gender. There can be no other classifications other than male
and female.
• Marital status. This is usually asked by simply saying “Are
you .....”
– Single ❑
– Married ❑
– Widowed ❑
– Divorced ❑
– Separated ❑
• Socio Economic Grade (SEG). This is a classification peculiar to
UK market researchers in which respondents are pigeonholed
according to the occupation of the head of the household.
Thus, it combines the attributes of income, education and
work status. In addition to social grades, researchers
sometimes classify respondents by income group or lifestyle.
In summary the socio economic grades are:
A higher managerial, administrative or professional
B intermediate managerial, administrative or professional
C1 supervisory, clerical, junior administrative or professional
C2 skilled manual workers
D semi-skilled and unskilled manual workers
E state pensioners, widows, casual and lowest grade workers.
141
EDWARD CARES
For most practical purposes these can be reduced to just four:
AB ❑
C1 ❑
C2 ❑
DE ❑
Alternatively, a question may be asked about the income of the
respondent or the combined income of the household. The ques-
tion would be de-sensitised by using income bands.
• Industrial occupation. In Europe companies are classified
according to their Standard Industrial Classification (SIC).
Often researchers condense the many divisions into more
convenient and broader groupings such as:
142
EDWARD CARES
In surveys of the general public, it may be relevant to establish
the level of employment of the respondent. For example:
Working full time (over 30 hours a week) ❑
Working part-time (8-30 hours a week) ❑
Housewife (full time at home) ❑
Student (full time) ❑
Retired ❑
Temporarily unemployed (but seeking work) ❑
Permanently unemployed (eg chronically sick, independent
means etc) ❑
• Number of employees. The size of the firm in which the
respondent works can be classified according to the number
of employees:
0–9 ❑
10–24 ❑
25–99 ❑
100–249 ❑
250 + ❑
• Location. Depending on the scope of the survey, this can be a
country code or in any single country a code indicating the
domicile of the respondent such as state in which they live
or a broader grouping such as East Coast, Central, West
Coast etc.
Think about
What classification questions would be most important in a sur-
vey for your company?
143
EDWARD CARES
Question: this is the framing of the pre-
Key point cise questions that are asked. Care needs
to be taken to ensure that the questions
Classification elicit a useful and unbiased response. The
questions are some questions can be open ended (used in
of the most smaller, qualitative surveys) or closed
important questions (used in quantitative surveys).
in the questionnaire
as they are used to Open ended questions: these are ques-
cross analyse the tions that invite free ranging responses –
data and pick up sometimes called verbatim responses.
different patterns of Such responses are extremely useful for
response across obtaining a deep understanding of the
different groups of respondents’ views and behaviour but
people. they are difficult to capture precisely (the
respondent may give a long winded
answer that is shortened by the interviewer) and are time consum-
ing to analyse. They are only suited to qualitative and small quan-
titative surveys.
144
EDWARD CARES
Multiple response questions: some questions can receive a number
of answers and others only one answer. For example, a question that
asks how many brands someone is aware of could generate a list of
names and therefore is multiple response. Another question may
seek to find out which brand is used most frequently and this could
allow for just one response (ie single response). Sometimes the ques-
tions are marked multi-response so that the interviewer knows that
more than one answer is anticipated and allowed.
145
EDWARD CARES
Rating scales: these are words, numbers or pictures that indicate a
range of different responses to a question. Scales suit researchers as
a means for locating a respondent’s view on a continuum but they
may not always be easy for respondents to relate to as they may not
think in such terms. Scales have engaged the interest of researchers
for years and some are named after their originator. Likert invented
an agree/disagree scale with five positions. Osgood gave his name to
the bi-polar scale. The Thurstone scale starts by generating a list of
possible statements that relate to a subject and then a distilled list is
created which is the scale of issues that covers the subject.
Routing: the instructions that tell an interviewer or a respondent
where to go next when completing the questionnaire.
Trade-off question: at its simplest this could be a question which
asks the respondent to spend a number of points between factors
that influence their choice of a product or brand. The more sophis-
ticated trade-off questions ask respondents to express their prefer-
ences between pairs of attributes or between concepts (with a price
attached). This is a conjoint measurement that produces utility val-
ues indicating the weight of importance attached to the different
attributes.
146
EDWARD CARES
• Personal, emotional or complicated questions should be at
the end to avoid people being put off answering further
questions
Obtaining a market research interview is not easy, especially given
the large number of surveys that are taking place and the bombard-
ment of our privacy through the ‘cold call’ selling of financial ser-
vices or double-glazing. The respondent believes, with some
justification, that they are giving up their valuable time and may be
getting little in return.
It is in the opening seconds of the introduction that the interview
will be won or lost and so the questionnaire must have an intro-
duction with a hook that interests the respondents.
Skills are required on the part of the interviewer to communicate
the introduction as quickly as possible so that respondent can start
talking and answering the questions. The more information that is
packed in to the introduction and the longer it takes, the more time
a respondent will have to think of reasons why they don’t want to
take part. A fast engagement is vital.
The interviewer’s approach really does make a difference.
Respondents like to feel that they are in the hands of a professional.
Someone that is businesslike without being pushy.
Respondents will talk to people they trust. Building trust in a few
seconds is difficult when the interviewer has only their voice and
words. However both can be powerful ordnance if they are used cor-
rectly. The right words and voice will create legitimacy for the inter-
view. The wrong ones will result in the brush off. It does therefore
help to have a script prepared before making contact with the
respondent to ensure that the introduction is, as near as possible,
the best one to win trust and co-operation.
In most cases, once a respondent has started the interview, they will
see it through to completion. Compliance is not a foregone conclu-
sion and a different set of skills is needed for the execution of the
interview itself.
The crucial requirement of any interview is to know the question-
naire thoroughly. This is especially the case with paper based ques-
tionnaires, as complex routings could break the flow of the
interview.
The interview is, of course, a script of a kind and the questions have
to be read out exactly as stated. Good interviewers develop their
own style, speaking at a moderate pace and with good clarity and
147
EDWARD CARES
diction. And, although it may be the last interview in a busy and tir-
ing day, they must sound interested. In fact, they need to be inter-
ested because a good interviewer really does have to listen.
Although the questionnaire is a script,
and it must be adhered to, there is scope
Key point to build in social lubrication and verbal
A good encouragements that indicate the inter-
questionnaire will be viewer is listening and is interested. The
successful in body language of the voice becomes even
collecting accurate more important in telephone interviews
facts and opinions as there is nothing else to create a rapport.
and will be an
By the time the interview if finished, a
enjoyable event for
relationship will have been created with
the respondent.
the respondent. The respondent should
be thanked for their time and effort and it
may be appropriate to ask permission to call again should it be nec-
essary to clarify any of the answers. (This is more important in busi-
ness to business interviews).
Think about
Write an introduction to a questionnaire that you think would
be successful in winning your cooperation. The introduction
should include all the necessary coverage of who is carrying out
the survey (not necessarily who is commissioning it), promises of
anonymity and confidentiality, how long it will take and a per-
suasive hook. See if you can use less than 100 words.
148
EDWARD CARES
SCARY STORY
In the 1980s, Coke became seriously concerned that it was losing
market share to Pepsi. In 1984 it only had a 4.9% lead of Pepsi in
the US. This was despite the fact that Coke outspent Pepsi on
advertising, by upwards of $100 million per year. One major
problem was that Pepsi’s advertising was simply more effective.
The Pepsi Challenge had been fabulously successful: Pepsi made
great play in its ads that in blind taste tests, people preferred
Pepsi to Coke.
Roy Stout, head of market research for Coca-Cola USA, put it this
way, “If we have twice as many vending machines, dominate
fountain, have more shelf space, spend more on advertising, and
are competitively priced, why are we losing share? You look at
the Pepsi Challenge, and you have to begin asking about taste.”
In September 1984, Coke thought that it had found the answer
with a new formula that beat Pepsi in blind taste tests by as much
as 6 to 8%. Bearing in mind that Pepsi had beaten Coke by any-
where from 10 to 15%, this was an 18% swing. All discouraging
market research was tossed into the bin and New Coke was
launched – with disastrous results.
When it hit the streets, New Coke was rejected by huge groups of
people. Comments were received such as “sewer water”, “furni-
ture polish”, “Coke for wimps”, “two-day-old Pepsi”, and “I miss
the battery acid tang”.
What we can learn from this story is not that the research carried
out by Coke or by Pepsi was wrong; rather that the wrong ques-
tions were asked. An assumption was made that Coke drinkers
chose the drink on taste and this became the subject of the study.
In fact the reality was far more subtle and the main driver of
choice was the brand. For years Coke was promoted as “the real
thing” and with the launch of New Coke, it implied that they
had been duped.
149
EDWARD CARES
CHAPTER 15
INDEX NUMBERS
Steve Cole/Photodisc/Getty Images
LEARNING OBJECTIVES
When you have completed this chapter, you
will be able to:
LO1 Compute and interpret a simple index.
Information on prices and quantities for margarine, shortening, milk, and LO6 Explain how the Consumer Price Index is
potato chips for the years 2004 and 2014 is provided in Exercise 27. constructed and interpreted.
Compute a simple price index for each of the four items, using 2004 as
the base period. (Exercise 27, LO1)
15.1 INTRODUCTION
In this chapter, we will examine a useful descriptive tool called an index. An index ex-
presses the relative change in a value from one period to another. No doubt you are fa-
miliar with indexes such as the Consumer Price Index (CPI), which is released monthly
by Statistics Canada. There are many other indexes, such as the S&P/TSX Composite
Index, Dow Jones Industrial Average (DJIA), Nasdaq, and the NIKKEI 225. Indexes are
published on a regular basis by the federal government; by business publications, such
as BusinessWeek and Forbes; in most daily newspapers; and on the Internet.
Of what importance is an index? Why is the CPI so important and so widely
reported? As the name implies, it measures the change in the price of a large group of
items consumers purchase. Governments, consumer groups, unions, management,
senior citizens organizations, and others in business and economics are very concerned
about changes in prices. These groups closely monitor the CPI as well as other indexes.
To combat sharp price increases, the Bank of Canada often raises the interest rate to “cool
down” the economy. Likewise, the S&P/TSX Composite Index measures the overall daily
performance of the largest publically traded companies on the Toronto Stock Exchange.
A few stock market indexes appear daily in the financial section of most newspa-
pers. Many are reported in real time on several websites.
Example According to Statistics Canada, the average total undergraduate tuition fees for full-time stu-
dents were $4025 in the 2003–2004 academic year and $5772 in the 2013–2014 academic
year. What is the index of the average total undergraduate tuition fees for full-time students for
the 2013–2014 academic year based on the 2003–2004 academic year?
Solution It is 143.4, found by:
Average total undergraduate tuition fees 2013–2014 academic year
I=
Average total undergraduate tuition fees 2003–2004 academic year
5772
= (100)
4025
= 143.4
Thus, the average total undergraduate tuition fees for the 2013–2014 academic year compared
with the average total undergraduate tuition fees for the 2003–2004 academic year is 143.4. This
means that there was a 43.4% increase in the average tuition fees during the seven-year period.
Source: Statistics Canada statcan.gc.ca/daily-quotidien/130912/dq130912b-eng.htm
Example An index can also compare one item with another. The population of British Columbia in
2013 was estimated at 4 606 371, and for Ontario, it was estimated at 13 603 904. What is the
population of British Columbia compared with Ontario?
Solution The index of population for British Columbia is 33.9, found by:
Population of British Columbia 4 606 371
I= = (100) = 33.9
Population of Ontatio 13 603 904
This indicates that the population of British Columbia is 33.9% (about one-third) of the popula-
tion of Ontario, or the population of British Columbia is 66.1% less than the population of
Ontario (100 - 33.9 = 66.1).
Source: Statistics Canada www5.statcan.gc.ca/cansim/a26?lang=eng&retrLang=eng&id=0510005&paSer=&pattern=&stByVal=1&p1=1&p
2=31&tabMode=dataTable&csid=
Example The numbers of passengers (in millions) for the five busiest airports in Canada in 2013 are
given below. What is the index for Toronto, Vancouver, Calgary, and Montreal compared with
Edmonton?
Number of Passengers
City Airport (millions) Index
Toronto Toronto Pearson International Airport 36.1 515.7
Vancouver Vancouver International Airport 18.0 257.1
Calgary Calgary International Airport 14.3 204.3
Montreal Pierre Elliot Trudeau International Airport 14.1 201.4
Edmonton Edmonton International Airport 7.0 100.0
Solution To find the four indexes, we divide the numbers of passengers for Toronto, Vancouver, Calgary,
and Montreal by the number at Edmonton and multiply by 100. We conclude that Calgary had
104.3% more passengers than Edmonton, Montreal 101.4% more, Vancouver 157.1% more, and
Toronto 415.7% more.
Source: en.wikipedia.org/wiki/List_of_the_busiest_airports_in_Canada#Canada.27s_21_busiest_airports_by_passenger_traffic
EDWARD CARES
Index Numbers 491
Number of
City Airport Passengers (millions) Index Found by
Toronto Toronto Pearson International Airport 36.1 515.7 36.1/7.0*100
Vancouver Vancouver International Airport 18.0 257.1 18.0/7.0*100
Calgary Calgary International Airport 14.3 204.3 14.3/7.0*100
Montreal Pierre Elliot Trudeau International Airport 14.1 201.4 14.1/7.0*100
Edmonton Edmonton International Airport 7.0 100.0 7.0/7.0*100
1. Index numbers are actually percentages because they are based on the number 100. How-
ever, the percent symbol is usually omitted.
2. Each index has a base period. The current base period for the CPI is 2002 = 100, changed
from 1992 = 100.
3. Most business and economic indexes are computed to the nearest whole number, such as
214 or 96, or to the nearest tenth of a percentage, such as 83.4 or 118.7.
pt
SIMPLE INDEX P= × 100 [15–1]
p0
EDWARD CARES
492 Chapter 15
Suppose that the price of a ski weekend package (including meals and lift tickets) at Blue
Mountain was $600 in 2004. The price rose to $795 in 2014. What is the price index for 2014
using 2004 as the base period and 100 as the base value? It is 132.5, found by:
pt $795
P= × 100 = (100) = 132.5
p0 $600
Interpreting this result, the price of the ski weekend increased 32.5% from 2004 to 2014.
The base period need not be a single year. Note in Table 15–1 that if we use 2005–2006 =
100, the base price for the stapler would be $21 [found by determining the mean price of 2005
and 2006: ($20 + $22)/2 = $21]. The prices $20, $22, and $23 are averaged if 2005–2007 had
been selected as the base. The mean price would be $21.67. The indexes constructed using the
three different base periods are presented in Table 15–1. (Note that when 2005–2007 = 100,
the index numbers for 2005, 2006, and 2007 average 100.0, as we would expect.) Logically, the
index numbers for 2014 using the three different bases are not the same.
self-review 15–1 1. Listed below are the top steel-producing nations, in tonnes (million), for the year 2013. Express
the amount produced by China, the European Union, Japan, and India as an index, using the
United States as a base. What percentage more steel does China produce than the United States?
Amount
Nation (million tonnes)
People’s Republic of China 779.0
European Union 165.8
Japan 110.6
United States 86.9
India 81.2
Source: issb.co.uk/global.html#CSP
2. The average weekly earnings (including overtime), educational and related services, in
Canada from 2008–2012 are given below:
Year Average Weekly Earnings
2008 $742.69
2009 770.30
2010 787.37
2011 808.69
2012 816.48
(a) Using 2008 as the base period and 100 as the base value, determine the indexes for 2008–2012.
Interpret the index.
(b) Use the average of 2009, 2010, and 2011 as the base and determine indexes for 2008–2012, us-
ing 100 as the base value.
(c) What is the index for 2011 data using 2009 as the base?
EXERCISES
1. Average house prices in dollars for Manitoba from January 2008 to January 2014 are listed below:
Develop a simple index for the change in list price based on the average of years 2010–2012.
2. The following table shows the average cost of a 1-bedroom apartment for selected cities across
Canada. See Connect for the data set.
Source: newsroom.bmo.com/press-releases/bmo-rider-nation-
tops-canada-s-cities-and-regions-tsx-bmo-201311210913105001.
Retrieved February 25, 2014.
a. Develop a simple index with January 2008 as the base year to show the change in the listed
prices. By what percentage did the list price increase over the seven years?
b. Develop a simple index with the average of January 2010 to January 2012 as the base year to
show the change in the list prices.
4. In January 2001, the price for a whole fresh chicken was $1.99 per kilogram. In September 2014, the
price for the same chicken was $5.49. Use the January 2014 price as the base period and 100 as the
base value to develop a simple index. By what percentage has the cost of chicken increased during
the 10-year period?
EDWARD CARES
494 Chapter 15
We could begin by computing a simple price index for each item, using 2004 as the
base year and 2014 as the given year. The simple index for bread is 204.1, found by using
formula (15–1).
Pt $1.98
P= × 100 = (100) = 204.1
P0 $0.97
We compute the simple index for the other items in Table 15–2 similarly. The largest price in-
crease is for bread, 104.1% (204.1 - 100 = 104.1), and milk was second at 45.9%. The price of
eggs increased 18.4% in the period, found by: 118.4 - 100.0 = 18.4. Then it would be natural
to average the simple indexes. The formula is:
©Pi
SIMPLE AVERAGE OF THE PRICE INDEXES P= [15–2]
n
where Pi refers to the simple index for each of the items and n the number of items. In our
example, the index is 140.7, found by:
©Pi 204.1 + 118.4 + p + 129.4 844.3
P= = = = 140.7
n 6 6
This indicates that the mean of the group of indexes increased 40.7% from 2004 to 2014.
A positive feature of the simple average of price indexes is that we would obtain the same
value for the index regardless of the units of measure. In the above index, if apples were priced in
tonnes, instead of kilograms, the impact of apples on the combined index would not change.
That is, the commodity “apples” represents one of six items in the index, so the impact of the
item is not related to the units. A negative feature of this index is that it fails to consider the rela-
tive importance of the items included in the index. For example, milk and eggs receive the same
weight, even though a typical family might spend far more over the year on milk than on eggs.
©pt
SIMPLE AGGREGATE INDEX P= × 100 [15–3]
©p0
This is called a simple aggregate index. The index for the above food items is found by sum-
ming the prices in 2004 and 2014. The sum of the prices for the base period is $12.76 and for
the given period, it is $17.04. The simple aggregate index is 133.5. This means that the aggre-
gate group of prices increased 33.5% in the 10-year period.
©pt $17.04
P= × 100 = (100) = 133.5
©p0 12.76
Because the value of a simple aggregate index can be influenced by the units of measure-
ment, it is not used frequently. In our example, the value of the index would differ significantly
if we were to report the price of apples in tonnes rather than kilograms. Also, note the effect of
coffee on the total index. For both the current year and the base year, the value of coffee is
slightly more than 40% of the total index, so a change in the price of coffee will drive the index
much more than any other item. Therefore, we need a way to appropriately “weight” the items
according to their relative importance.
©ptq0
LASPEYRES PRICE INDEX P= × 100 [15–4]
©p0q0
where:
P is the price index.
pt is the current price.
p0 is the price in the base period.
q0 is the quantity used in the base period.
Example The prices for the six food items from Table 15–2 are repeated below in Table 15–3. The num-
ber of units of each consumed by a typical family in 2004 and 2014 is also included.
T A B L E 1 5 – 3 Price and Quantity of Food Items, 2004 = 100
2004 2014
Item Price ($) Quantity Price ($) Quantity
Bread, white (loaf) $0.97 50 $1.98 55
Eggs (dozen) 1.85 26 2.19 20
Milk, white (litre) 0.98 102 1.43 130
Apples, red delicious (500 g) 1.98 30 2.75 40
Orange juice, (355 mL concentrate) 1.58 40 1.70 41
Coffee, 100% ground roast (400 g) 5.40 12 6.99 12
Determine a weighted price index using the Laspeyres method. Interpret the result.
EDWARD CARES
496 Chapter 15
Solution First, we determine the total amount spent for the six items in the base period, 2004. To find
this value, we multiply the base period price for bread ($0.97) by the base period quantity of
50. The result is $48.50. This indicates that a total of $48.50 was spent in the base period on
bread. We continue that for all items and total the results. The base period total is $383.96. The
current period total is computed in a similar fashion. For the first item, bread, we multiply the
quantity in 2004 by the price of bread in 2014, that is, $1.98(50). The result is $99.00. We make
the same calculation for each item and total the result. The total is $536.18. Because of the re-
petitive nature of these calculations, a spreadsheet is effective for carrying out the calculations.
The Excel output showing the calculations is given below:
2004 2014
Item Price ($) Quantity Price ($) Quantity P0Q0 PtQ0
Bread, white (loaf) $0.97 50 $1.98 55 $48.50 $99.00
Eggs (dozen) 1.85 26 2.19 20 48.10 56.94
Milk, white (litre) 0.98 102 1.43 130 99.96 145.86
Apples, red delicious (500 g) 1.98 30 2.75 40 59.40 82.50
Orange juice (355 mL, concentrate) 1.58 40 1.70 41 63.20 68.00
Coffee, 100% ground roast (400 g) 5.40 12 6.99 12 64.80 83.88
$383.96 $536.18
Laspeyres: 139.6
©ptqt
PAASCHE PRICE INDEX P= × 100 [15–5]
©p0qt
Example Use the information from Table 15–3 to determine the Paasche index. Discuss which of the
indexes should be used.
Solution Again, because of the repetitive nature of the calculations, Excel is used to perform the calcula-
tions. The results are shown in the following output:
EDWARD CARES
Index Numbers 497
2004 2014
Item Price ($) Quantity Price ($) Quantity P0Qt PtQt
Bread, white (loaf) $0.97 50 $1.98 55 $53.35 $108.90
Eggs (dozen) 1.85 26 2.19 20 37.00 43.80
Milk, white (litre) 0.98 102 1.43 130 127.40 185.90
Apples, red delicious (500 g) 1.98 30 2.75 40 79.20 110.00
Orange juice, (355 mL concentrate) 1.58 40 1.70 41 64.78 69.70
Coffee, 100% ground roast (400 g) 5.40 12 6.99 12 64.80 83.88
$426.53 $602.18
Paasche 141.2
How do we decide which index to use? When is the Laspeyres index more appropriate,
and when is the Paasche index the better choice?
Laspeyres’ Index
Advantages Requires quantity data from only the base period. This allows a more
meaningful comparison over time. The changes in the index can be at-
tributed to changes in the price.
Disadvantages Does not reflect changes in buying patterns over time. Also, it may over-
weight goods whose prices increase.
Paasche’s Index
Advantages Because it uses quantities from the current period, it reflects current buy-
ing habits.
Disadvantages It requires quantity data for the current year. Because different quantities
are used each year, it is impossible to attribute changes in the index to
changes in price alone. It tends to overweight the goods whose prices
have declined. It requires the prices to be recomputed each year.
, ,
FISHER’S IDEAL INDEX = 2(Laspeyres index) (Paasche s index) [15–6]
The Fisher’s index seems to be theoretically ideal because it combines the best features of
both Laspeyres’ and Paasche’s. That is, it balances the effects of the two indexes. However, it is
rarely used in practice because it has the same basic set of problems as the Paasche index. It
requires that a new set of quantities be determined for each year.
EDWARD CARES
498 Chapter 15
Example Determine the Fisher’s ideal index for the data in Table 15–3.
self-review 15–2 An index of clothing prices for 2014 based on 2004 is to be constructed. The clothing items consid-
ered are shoes and dresses. The information for prices and quantities for both years is given below.
Use 2004 as the base period and 100 as the base value.
2004 2014
Item Price ($) Quantity Price ($) Quantity
Dress (each) $75 500 $85 520
Shoes (pair) 40 1200 45 1300
EXERCISES
For exercises 5–8:
a. Determine the simple price indexes.
b. Determine the simple aggregate price indexes for the two years.
c. Determine Laspeyres’ price index.
d. Determine Paasche’s price index
e. Determine Fisher’s ideal index.
5. The prices of toothpaste (100 mL), shampoo (500 mL), cough tablets (package of 100), and antiperspi-
rant (45 g) for August 2004 and August 2014 are given below. The quantities purchased are also in-
cluded. Use August 2004 as the base.
6. Fruit prices and the amounts consumed for 2004 and 2014 are given below. Use 2004 as the base.
2004 2014
Item Price ($) Quantity Price ($) Quantity
Bananas (pounds [lb]) $0.23 100 $0.49 120
Grapefruit (each) 0.29 50 0.27 55
Apples 0.35 85 0.35 85
Strawberries (basket) 1.02 8 1.99 10
Oranges (bag) 0.89 6 2.99 8
EDWARD CARES
Index Numbers 499
7. The prices and the numbers of various items produced by a small machine and stamping plant are
reported below. Use 2003 as the base.
2003 2013
Item Price ($) Quantity Price ($) Quantity
Washer $0.07 17 000 $0.10 20 000
Cotter pin 0.04 125 000 0.10 130 000
Stove bolt 0.15 40 000 0.18 42 000
Hex nut 0.08 62 000 0.10 65 000
8. The quantities and prices of office supplies for the years 2004 and 2014 for Sam’s Student Centre are
given below:
2004 2014
Item Price ($) Quantity Price ($) Quantity
Pens (dozen) $0.90 50 $1.10 55
Pencils (dozen) 0.65 50 0.80 60
Erasers (each) 0.45 250 0.55 275
Paper, lined (package [pkg]) 0.89 500 1.09 750
Paper, printer (pkg) 5.99 300 4.99 450
Printer (cartridges) 15.99 150 19.99 200
©ptqt
VALUE INDEX V= × 100 [15–7]
©p0q0
Example The prices and quantities sold at the Waleska Department Store for various items of apparel for
May 2003 and May 2013 are as follows:
2003 2013
Quantity Sold Quantity Sold
2003 Price, (thousands), 2013 Price, (thousands),
Item p0 ($) q0 pt ($) qt
Ties (each) $10 1000 $12 900
Suits (each) 300 100 400 120
Shoes (pair) 100 500 120 500
What is the index of value for May 2013, using May 2003 as the base period?
Solution Total sales in May 2013 were $118 800 000, and the comparable figure for 2003 is $90 000 000
(Table 15–4). Thus, the index of value for May 2013 using 2003 = 100 is 132.0. The value of
apparel sales in 2013 was 132% of the 2003 sales. To put it another way, the value of apparel
sales increased 32% from May 2003 to May 2013.
©ptqt 118 800
V= × 100 = (100) = 132.0
©p0q0 90 000
EDWARD CARES
500 Chapter 15
2003 2013
Quantity Quantity
2003 Sold 2013 Sold
Price, (thousands), p0 q0 Price, (thousands), ptqt
Item p0 ($) q0 ($ thousands) pt ($) qt ($ thousands)
Ties (each) $10 1000 $10 000 $12 900 $10 800
Suits (each) 300 100 30 000 400 120 48 000
Shoes (pair) 100 500 50 000 120 500 60 000
$90 000 $118 800
self-review 15–3 The number of items produced by Houghton Products for 2004 and 2014 and the wholesale prices
for the two periods are as follows:
(a) Find the index of the value of production for 2014 using 2004 as the base period.
(b) Interpret the index.
EXERCISES
9. The prices and production of grains for August 2004 and August 2014 are as follows:
2004 2014
Quantity Quantity
Produced Produced
2004 (millions of 2013 (millions of
Grain Price ($) bushels) Price ($) bushels)
Oats $1.52 200 $1.87 214
Wheat 2.10 565 2.05 489
Corn 1.48 291 1.48 203
Barley 3.05 87 3.29 106
Using 2004 as the base period, find the value index of grains produced for August 2014.
10. The Johnson Wholesale Company manufactures a variety of products. The prices and quantities
produced for April 2004 and April 2014 are as follows:
2004 2014
2004 Quantity 2014 Quantity
Product Price ($) Produced Price ($) Produced
Small motor (each) $23.60 1760 $28.80 4259
Scrubbing compound (litre) 2.96 86 450 3.08 62 949
Nails (pound) 0.40 9460 0.48 22 370
Using April 2004 as the base period, find the index of the value of goods produced for April 2014.
EDWARD CARES
Index Numbers 501
Example A provincial Chamber of Commerce wants to develop a measure of general business activity
for the southwest area of the province. The director of economic development has been
assigned to develop the index. It will be called the General Business Activity Index of the
Southwest Region.
Solution After considerable thought and research, the director concluded that there were four factors
to be looked at: the regional department store sales (which are reported in $thousands), the
regional employment index (which has a 2003 base and is reported by the province), the
vehicle traffic reported in the region determine by independent studies (reported in thou-
sands), and exports of the industries in the region (in tonnes). The most recent available is
reported below:
After review and consultation, the director assigned weights of 40% to department store sales,
30% to employment, 10% to vehicle traffic, and 20% to exports.
To develop the General Business Activity Index of the Southwest Region for 2013 using
2003 = 100, each 2013 value is expressed as a percentage. For example, department store
sales for 2013 are converted to a percentage by (4400/2000)(100) = 220. This means that de-
partment store sales have increased by 120% in the period. This percentage is then multiplied
by the appropriate weight. For the department store sales, this is (220)(0.40) = 88.0. The details
of the calculations are as follows:
2008 2013
Department Store Sales (4100/2000)(100)(0.40) = 82.0 (4400/2000)(100)(0.40) = 88.0
Index of Employment (110/100)(100)(0.30) = 33.0 (125/100)(100)(0.30) = 37.5
Vehicle Traffic (300/500)(100)(0.10) = 6.0 (180/500)(100)(0.10) = 3.6
Exports (900/500)(100)(0.20) = 36.0 (700/500)(100)(0.20) = 28.0
157.0 157.1
The General Business Activity Index of the Southwest Region for 2008 is 157.0, and for 2013,
it is 157.1. Interpreting, business activity has increased by 57.0% from 2003 to 2008 and 57.1%
from 2008 to 2013.
As we stated at the start of the section, there are many special-purpose indexes. Here are a
few examples:
The Consumer Price Index Statistics Canada reports this index monthly. It describes the
changes in prices from one period to another for a “market basket” of goods and services. The
EDWARD CARES
502 Chapter 15
base year for the CPI as of 2014 is 2002 = 100.0. A historical summary of the CPI for Canada
from 2003 to 2013 follows. We present some applications later in the chapter.
S&P/TSX Composite Index Introduced in 1977 as The TSE 300 Composite Index, the
Toronto Stock Exchange’s composite index represented the average performance of 300 of
Canada’s largest public companies traded on the Toronto Stock Exchange. Effective May 2002,
the index was renamed S&P/TSX and is no longer restricted to 300 companies.
Dow Jones Industrial Average This is an index of stock prices, but perhaps it would be
better to say that it is an “indicator” rather than an index. It is supposed to be the mean price of
30 specific industrial stocks. However, summing the 30 stocks and dividing by 30 does not
calculate its value. This is because of stock splits, mergers, and stocks being added or dropped.
When changes occur, adjustments are made in the denominator used with the average. Today,
the DJIA is more of a psychological indicator than a representation of the general price move-
ment on the New York Stock Exchange (NYSE). The lack of representativeness of the stocks
on the DJIA is one of the reasons for the development of the NYSE Index. This index was de-
veloped as an average price of all stocks on the NYSE.
There are many other indexes that track business and economic behaviour, such as the
Nasdaq and the Russell 2000.
Real Income As an example of the meaning and computation of real income, assume the
CPI is 122.8 (2013) with 2002 = 100. Also, assume that Ms. Watts earned $35 000 in the base
period of 2002. She has a current income of $42 980. Note that although her money income has
increased by 22.8% since the base period of 2002, the prices she paid for food, gasoline,
EDWARD CARES
Index Numbers 503
clothing, and other items have also increased by 22.8%. Thus, Ms. Watts’ standard of living has
remained the same from the base period to the present time. Price increases have exactly
offset an increase in income, so her present buying power (real income) is still $35 000. (See
Table 15–5 for computations.) In general:
Money income
REAL INCOME Real income = × 100 [15–8]
CPI
Consumer
Price Index Computation
Year Money Income (2002 = 100) Real Income of Real Income
2002 $35 000 100 $35 000 35 000
(100)
100
2013 42 980 122.8 35 000 42 980
(100)
122.8
The concept of real income is sometimes called deflated income, and the CPI is called
the deflator. Also, a popular term for deflated income is income expressed in constant dollars.
Thus, in Table 15–5, to determine whether Ms. Watts’ standard of living changed, her money
income was converted to constant dollars. We found that her purchasing power, expressed in
2002 dollars (constant dollars), remained at $35 000.
self-review 15–4 The take-home pay of Jon Greene and the CPI for 2003 and 2013 are as follows:
Deflating Sales A price index can also be used to “deflate” sales or similar money series.
Deflated sales are determined by:
Actual sales
USING AN INDEX AS A DEFLATOR Deflated sales = × 100 [15–9]
An appropriate index
Example Sam’s Enterprises has retail stores in Victoria and Collingwood. Sales in 2003 were $445 873
and $775 995, respectively. Last year, sales were $773 998 and $973 545, respectively. Sam
wants to know how much sales have increased over the last 11 years, so he decides to deflate
the sales for last year to the 2003 levels. Given that the industry index increase is 122.3,
express Sam’s sales last year in constant 2003 dollars.
EDWARD CARES
504 Chapter 15
Sam’s Enterprises
Index Last year = 122.3
Sales Constant Dollars
2003 Last year (2003) Found by
Collingwood 775 995 973 545 796 030 = 973545/122.3*100
Victoria 445 873 773 998 632 868 = 773998/122.3*100
Comparing the sales for 2003 to the constant dollars, we see that sales grew in both locations
from 2003 to last year.
Purchasing Power of the Dollar The CPI is also used to determine the purchasing power of
the dollar.
Example Suppose the CPI this month is 125.0 (2002 = 100). What is the purchasing power of the dollar?
Cost-of-Living Adjustments The CPI is also the basis for cost-of-living adjustments, or COLA,
in many management–union contracts. The specific clause in the contract is often referred to as
the “escalator clause.” Many workers have their incomes or pensions pegged to the CPI.
The CPI is also used to adjust alimony and child support payments; attorneys’ fees; work-
ers’ compensation payments; rentals on apartments, homes, and office buildings; welfare pay-
ments; and so on. In brief, say, a retiree receives a pension of $500 a month and the CPI
increases by 5 points from 165 to 170. Suppose that for each point that the CPI increases, the
pension benefits increase 1%, so the monthly increase in benefits will be $25, found by $500
(5 points)(0.01). Now the retiree will receive $525 per month.
self-review 15–5 Suppose that the CPI for the latest month is 134.0 (2002 = 100). What is the purchasing power of the
dollar? Interpret.
A problem arises, however, when two or more series being compared do not have the
same base period. The following example compares price changes in the S&P/TSX Composite
Index and the DJIA.
Example We want to compare the price changes in the S&P/TSX Composite Index and the DJIA. The
two indexes from 2004 to 2013 follow. The information is reported at the end of December for
each year. (See Connect for the file Stock Indexes.)
Solution From the information given, we are not sure that the base periods are the same. Hence, a direct
comparison is not appropriate. Because we want to compare the changes in the two business
indexes, the logical approach is to let a particular period, say, December 2004, be the base for
both indexes. For the S&P/TSX Composite Index, the base is 9246.65, and for the DJIA, the
base is 10 783.01.
The calculation of the index for the S&P/TSX Composite Index for December 2010 is:
13 443.22
Index = (100) = 145.4
9246.65
The following Excel output shows the complete set of indexes:
We conclude that both indexes have increased over the period. The S&P/TSX Composite Index
has increased 47.3% over the time period, and the DJIA has increased 53.7% over the same period.
EDWARD CARES
506 Chapter 15
self-review 15–6 The following table shows the average earnings by gender of Canadian workers:
The changes in earnings for men and women are to be compared. Unfortunately, the base period of
2000 is different for the two groups. The base period for women is $27 500, and the base period for
men is $44 600. Calculate the indexes for both groups and interpret the findings.
EXERCISES
11. In 2002, Marilyn started working for $600 per week. How much would she have to earn in 2013 to
have the same purchasing power if the CPI is 122.8 in 2013. Use 2002 as the base year.
12. The price of a pair of boots in 2006 was $125, and $150 in 2014. During the same period, the CPI for
clothing and footwear increased by 3.1%. Did the price of the boots increase more than, the same, or
less than the CPI?
13. At the end of 2013, the average salary for a senior customer service representative at Mercury Distri-
bution Inc. was $48 500. The CPI for 2013 was 122.8 (2002 = 100.0). The mean salary for the same
position in the base period of 2002 was $39 000. What was the real income of the customer service
representative in 2013? How much had the average salary increased?
14. The Trade Union Association maintains indexes on the hourly wages for a number of the trades.
Unfortunately, the indexes do not all have the same base periods. Listed below is information on
plumbers and electricians. Shift the base periods to 2000, and compare the hourly wage increases.
15. In 1998, the mean salary of plant workers at Mercury Distribution Inc. was $26 650. The salary in-
cluded bonuses and overtime. By 2003, the mean salary increased to $31 972, and it was further
increased to $36 382 in 2008, $37 269 in 2011, and $39 500 in 2014. The company maintains infor-
mation on employment trends throughout its industry. Its industry index, which has a base of 1998,
was 122.5 for 2003, 136.9 for 2008, 144.9 for 2011, and 146.0 in 2014. Compare Mercury Distribu-
tion Inc.’s plant workers salaries to the industry trends.
16. Sam Steward is a freelance Web page designer. His yearly wages for the years 2009 through 2014 are
listed below. An industry index for computer programmers that reports the rate of wage inflation in
the industry is also included. This index has a base of 1998.
Compute Sam’s real income for the period. Did his wages match the increase or decline in the industry?
EDWARD CARES
Index Numbers 507
Chapter Summary
I. An index number measures the relative change from one period to another.
A. The major characteristics of an index are as follows:
1. It is a percentage, but the percent sign is usually omitted.
2. It has a base period.
3. Most indexes are reported to the nearest tenth of a percent, such as 153.1.
4. The base of most indexes is 100.
B. The reasons for computing an index are as follows:
1. It facilitates the comparison of unlike series.
2. If the numbers are very large, often it is easier to comprehend the change of the index than
the actual numbers.
II. There are two types of price indexes—unweighted and weighted.
A. In an unweighted index, we do not consider the quantities.
1. In a simple index, we compare the base period to the given period.
pt
I= × 100 [15–1]
p0
where pt refers to the price in the current period, and p0 is the price in the base period.
2. In the simple average of price indexes, we add the simple indexes for each item and divide
by the number of items.
©Pi
P= [15–2]
n
3. In a simple aggregate price index, the price of the items in the group are totalled for both
periods and compared.
©pt
P= × 100 [15–3]
©p0
Statistics in Action
In the 1920s, wholesale B. In a weighted index, the quantities are considered.
prices in Germany increased 1. In the Laspeyres method, the base period quantities are used in both the base period and the
dramatically. In 1920, given period.
wholesale prices increased
by about 80%; in 1921, the ©ptq0
P= × 100 [15–4]
rate of increase was 140%; ©p0q0
and in 1922, it was a
whopping 4100%! Between 2. In the Paasche method, current period quantities are used.
December 1922 and No-
©ptqt
vember 1923, wholesale P= × 100 [15–5]
prices increased by another ©p0qt
4100%. By that time, gov-
ernment printing presses 3. Fisher’s ideal index is the geometric mean of the Laspeyres’ index and Paasche’s index.
could not keep up, even by Fisher’s ideal index = 2(Laspeyres’ index)(Paasche’s index) [15–6]
printing notes as large as
500 million marks. Stories C. A value index uses both base-period and current-period prices and quantities.
are told that workers were
©ptqt
paid daily and then twice V= × 100 [15–7]
daily so that their wives ©p0q0
could shop for necessities
before the wages became III. The most widely reported index is the Consumer Price Index (CPI).
too devalued.
A. It is often used to show the rate of inflation.
B. It is reported monthly by Statistics Canada.
C. The base year for 2010 is 2002 = 100.0, changed from 1992 = 100.0 in January 2002.
EDWARD CARES
508 Chapter 15
Chapter Exercises
The following information is from the CREA file. (See Connect for the data file and source.)
17. Refer to the table above. Use the National Average as the base period, and compute a simple index for each city
for Jan-14. Interpret your findings.
18. Refer to the table above. Use the National Average as the base period, and compute a simple index for each city
for Jan-10. Interpret your findings.
19. Refer to the table above. Use the National Average as the base period, and compute a simple index for each city
for Jan-08. Interpret your findings.
20. Refer to the table above. Use the data from Jan-14 for Vancouver, Calgary, and Saskatoon as the base period,
and compute a simple index for each city for Jan-14. Interpret your findings.
21. Refer to the table above. Use the data from Jan-14 for Calgary and Saskatoon as the base period, and compute a
simple index for each city for Jan-14. Interpret your findings.
22. Refer to the table above. Compare Jan-14 with Jan-08 for the national average and each city. Which city in-
creased the most?
The following information from Blackberry Limited’s stock prices is taken from the first trading day in March each
year. (See Connect for the data file and source.)
23. Compute a simple index for the closing price. Use Mar-01 as the base period. What can you conclude about the
change in the closing stock price over the period?
24. Compute a simple index for the closing price using Mar-03 as the base period. What can you conclude about
the change in the closing price over the period?
25. Compute a simple index for the closing price using the period Mar-04–Mar-06 as the base period. What can you
conclude about the change in the closing price over the period?
26. Compute a simple index for the closing price using the period Mar-06–Mar-10 as the base period. What can you
conclude about the change in the closing price over the period?
EDWARD CARES
Index Numbers 509
The following information was reported on food items for the years 2004 and 2014:
2004 2014
Item Price ($) Quantity Price ($) Quantity
Margarine (454 g) $0.81 18 $2.39 27
Shortening (454 g) 0.84 5 1.49 9
Milk (2 liters [L]) 1.44 70 3.79 65
Potato chips (454 g) 2.91 27 3.99 33
27. Compute a simple price index for each of the four items. Use 2004 as the base period.
28. Compute a simple aggregate price index. Use 2004 as the base period.
29. Compute Laspeyres’ price index for 2014 using 2004 as the base period.
30. Compute Paasche’s index for 2014 using 2004 as the base period.
31. Determine Fisher’s ideal index using the values for the Laspeyres and Paasche indexes computed in the two
previous problems.
32. Determine a value index for 2014 using 2004 as the base period.
Betts Electronics purchases three replacement parts for robotic machines used in its manufacturing process. In-
formation on the price of the replacement parts and the quantity purchased is given below:
2005 2014
Part Price ($) Quantity Price ($) Quantity
RC-33 $0.50 320 $0.60 340
SM-14 1.20 110 0.90 130
WC-50 0.85 230 1.00 250
33. Compute a simple price index for each of the three items. Use 2005 as the base period.
34. Compute a simple aggregate price index for 2014. Use 2005 as the base period.
35. Compute Laspeyres’ price index for 2014 using 2005 as the base period.
36. Compute Paasche’s index for 2014 using 2005 as the base period.
37. Determine Fisher’s ideal index using the values for the Laspeyres and Paasche indexes computed in the two
previous problems.
38. Determine a value index for 2014 using 2005 as the base period.
Prices for selected foods for 2005 and 2014 are given in the following table:
2005 2014
Item Price ($) Quantity Price ($) Quantity
Cabbage (500 g) $0.60 2000 $0.90 1500
Carrots (bunch) 0.49 200 0.69 200
Peas (kilograms [kg]) 1.99 400 2.99 500
Endive (bunch) 0.89 100 1.29 200
39. Compute a simple price index for each of the four items. Use 2005 as the base period.
40. Compute a simple aggregate price index. Use 2005 as the base period.
41. Compute Laspeyres’ price index for 2014 using 2005 as the base period.
42. Compute Paasche’s index for 2014 using 2005 as the base period.
43. Determine Fisher’s ideal index using the values for the Laspeyres and Paasche indexes computed in the two
previous problems.
44. Determine a value index for 2014 using 2005 as the base period.
EDWARD CARES
510 Chapter 15
The prices of selected items for 2006 and 2014 are as follows. Quantity purchased is also listed.
2006 2014
Item Price ($) Quantity Price ($) Quantity
Paper, computer (pkg) $4.99 400 $5.99 500
Paper, lined (pkg) 0.89 1000 0.99 1200
Paper, plain (pkg) 0.99 850 1.19 1000
Paper, coloured (pkg) 1.49 350 1.79 350
45. Compute a simple price index for each of the four items. Use 2006 as the base period.
46. Compute a simple aggregate price index. Use 2006 as the base period.
47. Compute Laspeyres’ price index for 2014 using 2006 as the base period.
48. Compute Paasche’s index for 2014 using 2006 as the base period.
49. Determine Fisher’s ideal index using the values for the Laspeyres and Paasche indexes computed in the two
previous problems.
50. Determine a value index for 2014 using 2006 as the base period.
51. A special-purpose index is to be designed to monitor the overall economy of the region. Four key series were se-
lected. After considerable deliberation it was decided to weight retail sales 20%, total bank deposits 10%, industrial
production in the region 40%, and nonagricultural employment 30%. The data for 2006 and 2014 are as follows:
Bank Industrial
Retail Sales Deposits Production
Year ($ millions) ($ billions) (2003 = 100) Employment
2006 $1159.0 $87 110.6 1 214 000
2014 1971.0 91 114.7 1 501 000
Construct a special-purpose index for 2014 using 2006 as the base period, and interpret.
52. M Studios is studying its revenue to determine where its greatest growth has been. The business started
10 years ago, and a summary of sales is given below:
a. Make whatever calculations are necessary to compare the trend in revenue from 2007 to 2013.
b. Interpret.
53. The management of Ingalls Super Discount stores wants to construct an index of economic activity for its
metropolitan area. Management contends that if the index reveals that the economy is slowing down, inven-
tory should be kept at a low level.
Three series seem to hold promise as predictors of economic activity—area retail sales, bank deposits, and employ-
ment. All of these data can be secured monthly from the government. Retail sales is to be weighted 40%, bank
deposits 35%, and employment 25%. Seasonally adjusted data for the first three months of the year are as follows:
Construct an index of economic activity for each of the three months, using January as the base period.
EDWARD CARES
Index Numbers 511
54. The following table gives information on the CPI and the monthly take-home pay of Bill Martin, an employee
at the Jeep Corporation.
a. What is the purchasing power of the dollar for 2007 based on the period 2002?
b. Determine Mr. Martin’s “real” monthly income for 2007.
c. What is the purchasing power of the dollar for 2010 based on the period 2002?
d. Determine Mr. Martin’s “real” monthly income for 2010.
55. WSD Bank Inc. reported $17 446 (million) in commercial loans in 2000, $19 989 in 2002, $21 468 in 2004,
$21 685 in 2005, $15 922 in 2007, $18 375 for 2009, and $54 818 in 2014. Using 2000 as the base, develop
a simple index for the change in the amount of commercial loans for the years 2002, 2004, 2005, 2007, 2009,
and 2014, based on 2000.
The following are the quantities and prices for the years 2005 and 2014 for Kinzua Valley Geriatrics. 2005 is the
base period. Use this information for exercises 56 and 57.
2005 2014
Item Price Quantity Price Quantity
Syringes (dozen) $6.10 1500 $6.83 2000
Thermometres 8.10 10 9.35 12
Pain medication (bottle) 4.00 250 4.62 250
Patient record forms 6.00 1000 6.85 900
Computer paper (box) 12.00 30 13.65 40
59. In 2009, the mean salary for a marketing director with a bachelor’s degree was $89 673. The CPI for 2009
was 114.4. The mean annual salary for a marketing director in the base period of 2002 (2002 = 100.0) was
$69 800. What was the real income of the marketing director in 2009? How much had the mean salary
increased?
EDWARD CARES
512 Chapter 15
60. The prices and quantities of various items sold at the Accessory Shop in July 2007 and July 2014 are as follows:
2007 2014
Item Price Quantity Price Quantity
Handbags $49.00 1500 $79.00 2000
Gloves 25.00 10 30.00 12
Umbrellas 14.00 250 18.00 250
Scarves 21.00 1000 25.00 900
Hats 22.00 325 35.00 525
Determine the value index for July 2014 using July 2007 as the base year. Interpret the index.
61. The following table gives information on the CPI and the yearly salary of Simone Smith:
a. What is the purchasing power of the dollar for 2013 based on the period 2002?
b. Determine Simone Smith’s “real” yearly salary for 2010. Interpret the result.
c. What is the purchasing power of the dollar for 2013 based on the period 2006?
d. Determine Simone Smith’s “real” yearly salary for 2013. Interpret the result.
The following are the quantities and prices for the years 2009 and 2014 for Nine Thirty Photography, (2009 = 100).
Use this information for exercises 62 to 64.
2009 2014
Item Price Quantity Price Quantity
Camera $825.00 300 $975.00 500
Lens 125.00 200 175.00 250
Case 20.00 250 28.00 250
Lights 21.00 1000 25.00 900
Storage 110.00 325 110.00 525
Practise and learn online with Connect. Questions and tables with online data sets are marked with .
65. a. Use the file Stock Indexes on Connect to compare the price changes in the S&P/TSX Composite Index and
the NASDAQ from 2001 to 2013. Interpret your findings.
b. Use the file Stock Indexes on Connect to compare the price changes in the S&P/TSX Composite Index and
the S&P 500 from 2001 to 2013. Interpret your findings.
c. Use the file Stock Indexes on Connect to compare the price changes in the S&P/TSX Venture Index and the
NASDAQ from 2001 to 2013. Interpret your findings.
d. Use the file Stock Indexes on Connect to compare the price changes in the S&P/TSX Venture Index and the
S&P 500 from 2001 to 2013. Interpret your findings.
EDWARD CARES
Index Numbers 513
Practice Test
Part I Objective
1. To compute an index, the base period is always the (numerator, denominator, can be in
either, always 100).
2. A number that measures the relative change from one period to another is called a/an .
3. In a weighted index, both the price and the are considered.
4. In a Laspeyres index, the quantities are used in both the numerator and denominator. (base
period, given period, oldest, newest).
5. The current base period for the CPI is .
Part II Problems
1. The sales at Roberta’s Ice Cream Stand for the last five years are as follows:
Year Sales
2007 $130 000
2008 145 000
2009 120 000
2010 170 000
2011 190 000
a. Find the simple index for each year using 2007 as the base year.
b. Find the simple index for each year using 2007–2008 as the base year.
2. The prices and quantities of several golf items purchased by members of the men’s golf league at the Osler
Bluffs Golf and Tennis Club are as follows:
2006 2011
Price Quantity Price Quantity
Driver $250.00 5 $275.00 6
Putter 60.00 12 75.00 10
Iron 700.00 3 750.00 4
a. Determine the simple aggregate price index, with 2006 as the base period.
b. Determine Laspeyres’ price index.
c. Determine Paasche’s price index.
d. Determine the value index.
Answers to Self-Reviews
15–1 1. Amount 2. Average Weekly
Nation (millions of tonnes) Index Year Earnings Index
China 779.0 896.4 2008 $742.69 100.0
European Union 165.8 190.8 2009 770.30 103.7
Japan 110.6 127.3 2010 787.37 106.0
United States 86.9 100.0 2011 808.69 108.9
India 81.2 93.4 2012 816.48 110.0
China produces 796.4% more steel than the United States. (a) Wages have increased by 10% from 2008 to 2012.
EDWARD CARES
514 Chapter 15
(b) Base = (770.30 + 787.37 + 808.69)/3 = 788.79. 15–4 (a) $34 046.69, found by: (35 000/102.8)(100).
(b) In terms of the base period, Jon’s salary was $34 046.69
Average Weekly
in 2003 and $36 074.92 in 2013.
Year Earnings Index
(c) This indicates his take-home pay increased at a slightly
2008 $742.69 94.2 faster rate than the price paid for food, transportation,
2009 770.30 97.7 and so on.
2010 787.37 99.8 15–5 $0.75, found by: ($1.00/134.0)(100). A 2002 dollar is worth
2011 808.69 102.5 only $0.75 this month.
2012 816.48 103.5
(c) (808.69/770.30 (100) = 105.0. 15–6 Year Women Index Men Index
15–2 (a) P1 = ($85/$75)(100) = 113.3. 2000 27 500 100.0 44 600 100.0
P2 = ($45/$40)(100) = 112.5. 2001 27 600 100.4 44 500 99.8
P = (113.3 + 112.5)/2 = 112.9. 2002 29 300 106.5 46 700 104.7
(b) P = ($130/$115)(100) = 113.0. 2003 29 000 105.5 46 000 103.1
$85(500) + $45(1200) 2004 29 400 106.9 46 200 103.6
(c) P = (100) 2005 30 000 109.1 46 900 105.2
$75(500) + $40(1200)
$96 500 2006 30 500 110.9 47 100 105.6
= (100) = 112.9. 2007 31 300 113.8 47 800 107.2
85 500
$85(520) + $45(1300) 2008 31 700 115.3 49 300 110.5
(d) P = (100) 2009 32 600 118.5 47 400 106.3
$75(520) + $40(1300) 2010 32 600 118.5 47 800 107.2
$102 700 2011 32 100 116.7 48 100 107.8
= (100) = 112.9.
$91 000
The average earnings for men increased by 7.8% over the
(e) P = 2(112.9)(112.9) = 112.9. time period and increased 16.7% for women over the same
$4(9000) + $5(200) + $8(5000) time period.
15–3 (a) P = (100)
$3(10 000) + $1(600) + $10(3000)
$77 000
= (100) = 127.1.
60 600
(b) The value of sales has gone up 27.1% from 2004 to
2014.