Students Tutorial Answers Week2

BES Tutorial Sample Solutions, S2/10
It will be gradually posted on BES website with one week delay

& without the tutor notes denoted by TN.
WEEK 2 TUTORIAL EXERCISES

guide to solutions
1. What is meant by a variable in a statistical sense?
Distinguish between qualitative and quantitative
statistical variables, and between continuous and
discrete variables. Give examples.
A variable in a statistical sense is just some
characteristic of an object. It may take different values.
Data on a quantitative variable can be expressed
numerically in a meaningful way (e.g. height of an
individual, number of children in a family. Data on
qualitative variables cannot be expressed numerically in
a meaningful way; e.g. sex of an individual, hair colour).
A discrete quantitative variable can assume only certain
discrete numerical values on the number line (can be a
finite or infinite number of these values). A continuous
quantitative variable can assume any value in a specific
range or interval; e.g. length of a pipe.
2. Distinguish between (a) a statistical population and a

sample; (b) a parameter and a statistic. Give examples.
A statistical population is the set of measurements or
observations of a characteristic of interest for all
elementary units in a frame; e.g the shoe sizes of all men
in Australia. A statistical sample is a subset of a
population; e.g. the shoe sizes of all the men in the class
is a sample of the population represented by the shoe
sizes of all men in Australia.
A parameter is a numerical description of a population.
For example, the average shoe size of all Australian men
is a parameter (of the population of the shoe sizes of all
Australian men). A statistic is a numerical description of
a sample. For example, the average shoe size of all men
in this class room is a statistic (calculated from the
sample of the shoe sizes of all men in this class room).
3. In order to know the market better, the second-hand

car dealership, Anzac Garage, wants to analyze the
age of second-hand cars being sold. A sample of 20
advertisements for passenger cars is selected from the
second-hand
car
advertising/listing
website
www.drive.com.au The ages of the vehicles at time of
advertisement are listed below:
5, 5, 6, 14, 6, 2, 6, 4, 5, 9, 4, 10, 11, 2, 3, 7, 6, 6, 24,
11
(a) Calculate frequency, cumulative frequency and
relative frequency distributions for the age data using
the following bin classes:
More than 0 to less than or equal to 8 years
More than 8 to less than or equal to 16 years
More than 16 to less than or equal to 24 years.
Bin
Relative
Frequency
Frequency
Cumulative
Frequency
0.7
14
14
16
0.25
19
16
24
0.05
20
(b)Sketch a frequency histogram using the calculations in

part (a). What can you say about the distribution of
the age of these second-hand cars? Is there anything
wrong with the frequency table and histogram?
Specifically, is the choice of bin classes appropriate?
What needs to be done?
Relative frequency histogram for Age

0.8
0.7
Frequency
0.6
0.5
0.4
0.3
0.2
0.1
0
8
16
24
Bin
From this graph (it was not necessary to use EXCEL

although it is good practice), the Age distribution
appears to be skewed to the right. 70% of observations
have age between 0 and 8. However, this histogram only
provides limited information about the Age distribution
because there are too few bins and they are very wide.
(c) Halve the width of the bins (0 to 4, 4 to 8, etc) and
recalculate the frequency, cumulative frequency and
relative frequency distributions. Using the new
distributions and histogram, what can you now say
about the distribution of the age of second-hand cars?
Bin
0
4 < Age 8
8 < Age 12
12 < Age 16
16 < Age 20
20 < Age 24
Cumulative
Relative
Frequency Frequency Frequency
0.25
5
5
0.45
9
14
0.2
4
18
0.05
1
19
0
0
19
0.05
1
20
Frequency
Figure 3.1: Revised histogram for age

of cars
10
9
8
7
6
5
4
3
2
1
0
2
10
14
18
22
Age
There still appears to be a skew to the right, but now we

can also see that there is an outlier in the 21~24 Age
category. 5~8 are the most frequently observed ages. A
quite sizable proportion of the second-hand cars are
relatively new (25% being less or equal to 4 years old).
4. Management of a major bank has asked the Human

Resources Department to provide an analysis of sick
leave taken by the staff of one of their branches. The
days taken as sick leave in the last calendar year for
all 25 branch employees were:
0, 10, 9, 5, 0, 0, 5, 10, 0, 0, 10, 1, 0, 0, 0, 0, 10, 5, 10,
45, 0, 2, 1, 0, 5
a. What are the key features of these data?
Its a bit difficult to tell. Even ordering would help.
But clearly 45 is an outlying (large relative to the
others) observation.
At the other extreme there are several employees
who havent taken any sick days.
b. Calculate the frequency distribution and sketch the

frequency histogram. Does this provide any extra
information to summarize these data relative to what
you observed in (a)?
bin
Frequency
11
2
1
0
0
4
0
0
0
1
5
1
0
1
2
3
4
5
6
7
8
9
10
More
Histogram
12
Frequency
10
8
6
4
2
0
0
10
More
bin
The clustering of the data are now very clear. Over half
of employees take very few sick days. There were a few
who took 5 days which may represent a single one week
break because of a more serious illness and finally there
were quite a few taking two weeks off.
7
c. What do you think would be the best way to

summarize the data for Management?
While the histogram is a reasonable representation the
verbal description in (b) is also a good description in this
case.
d. Is this an analysis of a population or a sample?
Depends what the problem is. It could be a population if
interest is confined to this branch in this year.
Alternatively it could be a sample if management is
interested in all branches or this particular branch over
time.
5. SIA: Health expenditure

A recent report by Access Economics provides a
comparison of Australian expenditures on health with
that of comparable OECD countries. Data from that
report relating to 2005 have been used to reproduce
their Figure 2.2 (below denoted as Figure 2.1).
(a) What are the key features of these data?
A strong positive association more per capita
GDP implies more Health Expenditure per capita.
There are (at least) 2 outliers, the observation with
the largest Health Expenditure (Luxembourg) and
the observation with the highest GDP (USA).
Without these 2 the relationship is approximately
linear. With them, there is a suggestion of a nonlinear relationship.
An indication of more variability in health
expenditures when GDP is larger.
(b)While this is a bivariate scatter plot, there are three

variables involved: health expenditure, GDP and
population. Why account for population by expressing
health expenditure and GDP in per capita terms?
Healthexpenditurepercapita (US$000)
Figure2.1OECDHealthExpenditureand
GDP
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
GDPpercapita(US$000)
This is recognition that there may be factors other than

GDP associated with Health Expenditures and
population size is one obvious factor. Expressing
everything in per capita terms is one way to control for
population variations and hence isolate the GDP Health
Expenditure relationship.
10

Students Tutorial Answers Week2

Uploaded by

Copyright:

Available Formats

Students Tutorial Answers Week2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Students Tutorial Answers Week2

Uploaded by

Copyright:

Available Formats

BES Tutorial Sample Solutions, S2/10

It will be gradually posted on BES website with one week delay

WEEK 2 TUTORIAL EXERCISES

2. Distinguish between (a) a statistical population and a

3. In order to know the market better, the second-hand

(b)Sketch a frequency histogram using the calculations in

Relative frequency histogram for Age

From this graph (it was not necessary to use EXCEL

Figure 3.1: Revised histogram for age

There still appears to be a skew to the right, but now we

4. Management of a major bank has asked the Human

b. Calculate the frequency distribution and sketch the

c. What do you think would be the best way to

5. SIA: Health expenditure

(b)While this is a bivariate scatter plot, there are three

This is recognition that there may be factors other than

You might also like