0% found this document useful (0 votes)
115 views20 pages

CHPT20 Histograms and Frequency Distribution Diagrams

Uploaded by

bendylan82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views20 pages

CHPT20 Histograms and Frequency Distribution Diagrams

Uploaded by

bendylan82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

i::t:,,r!

li..i:

20 Histosrams
Histogram
a nd frequency
d istribution
(ontinuous
Class interval
Frequency "
:

Grouped o
Frequency tables
Frequency density
rdodal class
d lagrams
Cumulative{reguency,,
Cumulative f requency':curve
Quartiles
t lnterquartile range
t Percentiles

ln this chapter you


will learn how to:
| :cnstruct and use
nistograms with equal
rtervals
' :cnstruct and use
ristograms with unequal
rte rvals
' jraw cumulative frequency
:: b les
' ,se tables to consttuct
' -rmulative ftequency
: iagrams
The diagram on the top right of this digital camera screen is a tlpe of histogram which shows how light and
' rcentify the modal class
from a grouped frequency shadows are distributed in the photograph. The peaks at the left show that this photo (the red flower) is too
C istribution. dark (underexposed).

You have already coilected, organised, summarised and displayed different sets of data using pie
charts, bar graphs and line graphs. In this section you are going to work with numerical data
(sets of data where the class intervals are numbers) to learn how to draw frequency distribution
diagrams called histograms and cumulative frequency curves.

Histograms are useful for visuaily showing patterns in large sets of numerical data. The shape of the
graph allows you see where most of the measurements are located and how spread out they are.

Unit s: Data handling 419


'l'i: 20"Hirtogr.mi'un-i 'itHgiems ...

20.l Histograms
A histogram is a specialised graph that looks a lot like a bar chart but is normally used to
the distribution ofcontinuous or grouped data.

Look at this histogram showing the ages of people visiting a gyrn.

Ages of people visiting a gym

Frequency

Age in years

Notice that:

r The horizontal scale is continuous and each column is drawn above a particular
Continuous data was introduced in class interval.
chapter 4. { r The frequency ofthe data is shown by the area ofthe bars.
I There are no spaces between the bars on the graph because the horizontal scale is
continuous. (Ifthe frequency relating to a class interval is 0, you wont draw a bar in
that class, so there will be a gap in the bars in that case.)

Flistogranns with eqrral class intervals


The grouping of data into classes was When the class intervals are equal, the bars are all the same width. Although it should be
covered in chapters 4 and 12.1 the area of the bar that tells you the frequency of the class, it is common practice when class
intervals are all equal to just let the vertical scale show the frequency per class interval (and so it
is just labelled 'frequency', as in the diagram above).

Unit 5: Data handling


,.q

20 Histograms and frequency distribution diagrams

The table and histogram below show the heights of trees in a sample from a forestry site.

Height of tree (m)


30
0<h<5
5<h<r0
10<h<15 20
15<h<20
Frequency
20 <h <25
25<h<30
30<h <35
35<h<40
ra) How many trees are less than 5 m tall?
'b) What is the most common height of tree? 10 15 20 25 30 35
,c) How many trees are 20m or taller? Height of tree (m)
rd) Why do you think the class intervals include inequality symbols?
e) Why is there a gap between the columns on the right-hand side of the graph?
(e) 21 Read the frequency (vertical scale on the histogram) for the bar 0 - 5.

&) 25 < h <3om Find the tallest bar and read the class interval from the horizontal scale.

k} 6l Find the frequency for each class with heights of 20 m or more and add them together.

{i0 The horizontal scale of a histogram is continuous, so the class intervals are also continuous. The inequality symbols
prevent the same height of tree falling into more than one group. For example, without the symbols a tree of height S m
could go into two groups and thus be counted twice.
{:f The frequency for the class interval 50 < h < 55 is zero, so no bar is drawn.

ir:Ej :snt5f tf:iri:E+

Joy-Anne did an experiment in her class to see what mass of raisins (in grams) the I
students could hold in one hand. Here are her results. Y
I
l

r8r8202222222223i i
2324242s25252s25F r.

2526262730303135!]
I

(a) 16-20,21-25,26-30 and 3l-55 draw a grouped


fiE@i
iir:r
i'..',,r:
Using the class intervals
frequency table.
f.

=
t--=: how to draw grouped (b) What is the modal class (the mode) of this data?
i

F
'Ilm{,tEuri.Fcy- tables in chapter 4. <(
(c) Draw a histogram to show her results. F
I

(a) ffi Countthe numberin each classtofill i

ffi
tr6-20t3tt
16-20 3
inthetable. i

'l 21-25
21-25 I 1414 I r
I

lzo-sol5ll 26-30 5

ffi
| -'31-35
::______________ 2 i
l

(b) 21-25. of
I

.ruEG;,. .,,,.,,
The modal class is lt is actually not possible to find the mode
grouped data because you do not have the
I
l.
- j', :-rapter l2 that mode is individual values within each group. lnstead, I
llrrr, - .r''=:-ent reSUlt. d you find the class interval that has the- greatest
fi
frequency. This is called the'modal class'
-r
_
(Extended students learned this in chapter 1 ,l
2).
l
i

'' :-_ : -- : "_1

Unit 5: Data handling 421


t'.t:l:.:,:. *f
i.:tr::::?r!:.i.tj;:r.r.
...ta :::)':
aa'l.,:a:-':t.t :,'a
-il: r. :: -1,. ..:' i ..r:

Mass of raisins per grab Although the data is in


16 discrete groupt the raw
14 data is actually continuous
t2 (it is mass). When Joy-
Anne grouped the data,
10
she rounded each mass
Frequency g
to the nearest gram. This
6 means that some raisins wrl
4 have an ocfuol mass that is
2 between two of the discrete
groups. To take this into
0
25 30 account each bar is plotted
You learned about upper and lower Mass of raisins (g) according to its upper and
{
bounds in chapter 1 3. lower bound. So, the group
l6 - 20 is drawn from 15.5
< h <2o.5 and so on, such
that a handful of raisins with
a mass of 2o.sog, would
be in the class interval 2l
- 25 because the group's
boundaries are 20.5 - 25.W

Exercise 20.1 tiving mqths


I Maria is a midwife who recorded the mass of the babies she delivered in one month.

0.5 < m <1.5 1.5 < m <2-5 2.5 < m <3.5 3.5 < m <4.5 4.5<m<55
1 T2 3l t6 0

(a) What is the modal class?


(b) How many babies have a mass of 2.5kg or less?
(c) Draw a histogram to show this distribution.
2 Annike did a breakdown of the length of telephone calls (r) on her mobile phone account.
These are her results.

0<t<2
2<t<4
4<t <6
6</<8
8<t<10
L0<t<12
t2< t <r4
14<t<16
(a) How many calls did she make altogether?
(b) What is the most common length of a call?
(c) Draw a histogram to show this distribution.
(d) Make a new frequency table of these results using the class intervals given opposite.

Unit 5: Data handling


11;";*'zz
(d) Draw a histogram to show the new distribution.
(e) Write a few sentences comparing the distribution shown on the two histograms.

Shamiela cut 30 pieces of ribbon, which she estimated were each about 30 cm long. Her sister
measured them and got the following actual lengths in centimetres:

29.t 30.2 30.5 31.1 32.0 31.3 29.8 29.5 31.6 32.4
32.t 30.2 3I.7 3t.9 32.t 29.9 32.r 31.4 28.9 29.8
3t.2 31.2 30.5 29.7 30.3 30.4 30.1 31.1 28.8 29.5

(a) Draw a suitable frequency distribution table for this data. Use an equal class interval.
(b) Construct a histogram to show your distribution.
(c) How accurately did Shamiela estimate? Give a reason for your answer.
The French Traffic Police recorded the number of vehicles speeding on a stretch of highway
careful of discrete groups o{
on a Friday night. Draw a histogram to show this data.
data; the raw data is
so can take any value
the groups.

5 Here are the IQ-test scores of a group of students.


books, you might see
being used for grouped
data. Question 5 is a
example. ln these cases,
tre histogram by extending
of each class
to make them continuous,
,l00
-
drange 95 99 and - 104
I 10-1 14
<m<99.5,99.5Sm<
etc To draw a bar chart from I 15-1 19
you would treat each
as a 'category' and draw the
with gaps as normal t25-129

Draw a histogram to show this distribution.

rip !-listograms with unequal elass intervals


D{otice that f)
- fdx cw When the class intervals are not the same, using the height to give the frequency can be
:area of a bar. You can misleading. A class that is twice the width of another but with the same frequency covers twice
ue this to help you read the area. So, if the height is used to represent the frequency, the initial impression it gives is that it
fioquencies from the contains more values, which is not necessarily the case (see worked example 3). To overcome this,
frfutogam. Many questions when the class intervals are unequal a new vertical scale is used called the frequency density.
iue based on this principle.

(rA =
frequencydensity
.ffiS
Frequency density takes into account the frequency relative to the size of the class interval,
making it more fair when comparing different sized intervals.

Unit 5: Data handling 423


ri;t: l,t:::;;
-: ',,

First work out the frequency density by adding columns to your frequency distribution
table like this:

The heights in cm are the class


intervals. The number of plants is
the frequency.
5<h<15
lf the data was plotted against
frequency instead of frequenry 15<h<20
density (see below), it looks as
though there are more plants in
the class 25 - 40 compared to
-
the class 5 10 but actually, their 20<h <25
frequency densities are the same
(see histogram in Worked example
3). The larger size of interval 25 <h <40
is misleading here, so we use
frequency density as it is a fairer
way to compare frequencies in
Next draw the axes. You will need to decide on a suitable scale for both the horizontal
classes of different sizes.
and the vertical axes. Here, I cm has been used to represent locm on the horizontal
axis (label height in cm) and 2 cm per unit on the vertical axis (label frequency density).
once you have done this, draw the histogram, paying careful attention to the scales on
the axes.

Frequency
density I

25 30 35 40
Height in cm

Unit 5: Data handling


Exercise 20.2 140 people at a school fund-raising event were asked to guess how many sweets were in a
large glass jar. Those who guessed correctly were put into a draw to win the sweets as aptize'
The table shows the guesses.

100 < n<200


200<n<250
25O < n <300

300<n<350
350<n<400
400<n<500
(a) Use the table to calculate the frequency density for each class.
(b) Construct a histogram to display the results. Use a scale of l cm = 100 sweets on the
horizontal axis and a scale of 1 cm = 0.2 units on the vertical axis.
The table shows the mass of young children visiting a chnic (to the nearest kg).
Draw a histogram to illustrate the data.

6<m<9
9 <m<12
12<m <18
18<m<21
2L<m<30

The table shows the distribution of the masses of the actors in a theatre group.
Draw a histogram to show the data.

6O<m<63
63 <m <64
64< m<65
65< m <66
66< m <68
683m<72

t-iving rnsffls

Percentage body fat of soldiers 4 A group of on-duty soldiers underwent fitness tests in which their percentage body fat was
calculated. The fitness assessor drew up this histogram ofthe results.

(a) How many soldiers were tested?


:!0
(b) How many soldiers had body fat levels within the healthy limits?
t5
l0
(c) How many soldiers had levels which were too high?
5 (d) \Mhy do you think there no bar in the 0-4 category?
(e) Would you expect a similar distribution if you tested a random selection of people in
8 12162024283236 your community? Give reasons for your answers'
Percentage body fat

Unit 5: Data handling


A traffic'officer used a computer program to draw this histogram showing the average
(in km/h) of a sample ofvehicles using a highway. The road has a minimum speed limit
and a maximum speed limit of 125km/h.

Frequency
densr[.

Speed (km/h)

(a) Is it easy to see how many vehicles travelled above or below the speed limit?
Give a reason for your answer.
(b) The traffic officer claims the graph shows that most people stick to the speed limit.
Is he correct? Give a reason for your answer.
(c) His colleagues want to know exactly how many vehicles travel below or above the speed

(i) Reconstruct this frequency table. Round frequencies to the nearest whole number.

0<s<50
50<s<65
65<s<80
80<s<95
95<s<110
110<s< 125

125Ss<180

(ii) FIow many vehicles were below the minimum speed limit?
(d) What percentage of vehicles in this sample were exceeding the maximum speed limi8

2O.2 Cumulativefrequency
Sometimes you may be asked questions such ast

r How many people had a mass of less than 50 kilograms?


o How manycars were travelling above 100km/h?
r How many students scored less than 50% on the test?

In statistics you can use a cumulative frequency table or a cumulative frequency curve to
Cumulative means'increasing as
answer questions about data up to a particular class boundary. You cart also use the cumulati
more is added'.
frequencies to estimate and interpret the median and the value of other positions of a data set-
Unit 5: Data handling
r

0<h<5
5<h<10
l0<h<15
15<h<25
25<h<50

<Wl,:,.,
ln chapter 12, median classes were {b}: ls <h<2s The heights are given lor 25O flowers, so the median height must
introduced for grouped data. You
a:.: .. : : be the mean of the height of the t 25th and 'l26th flower. lf you
will see that cumulative frequency look at the cumulative frequency you can see that this value falls
curves will enable you to estimate into the fourth height class (the l25th and I26th are both greater
the median when the number of than l 2o but less than 200)'
data is large. { i

CeJ{x ai {*ti\r* f r*q n*e*{y eL, !-v€s


When you plot the cumulative frequencies against the upper boundaries of each class interval
Tip you get a cumulative frequency curve.
You must plot the
Cumulative frequency curves are also called ogive curves or ogives because they take the shape
cumulative frequency at
of narrow pointed arches (called ogees) like these ones on a mosque in Dubai.
the upper end point of
the class interval. Do not
confuse this section with
the mid-point calculations
you used to estimate the
mean in frequency tables.

In mathematics, arches like these are seen as two symmetrical s-curves,

428 Unit 5: Data handling


Unit 5: Data handling
',74;,:
I 1111
:;l;1 ;1

T2 = ,t so its the 25th result; drop a perpendicular from where

the line cuts the graph.

4', Read off the cumulative frequency at l0 minutes.

:l
50-18=32'l Subtract the cumulative frequency at 3O minutes, 18, from the total
frequency.

''.
42 - 28 = 14 Subtract the cumulative frequency at 4O minutes, 28, from that at
.rr 60 minutes, 42.

You learned how to find an estimate


for the mean of grouped data in
chapter 1 2. Revise this now if you
haveforgotten it. {

43O Unit 5: Data handling


Exetcise 20.3 I The heights of 25 plants were measured to the nearest centimetre.
The results are summarised in the table.

(a) Draw a cumulative frequency table for this distribution.


(b) In which interval does the median plant height lie?
(c) Draw the cumulative frequency curve and use it to estimate, to the nearest centimetre,
the median plant height.

Unit 5: Data handling 4Sl


l4:.:.

2 The table shows the amount of money, $x, spent on books by a group of studentr,

0<x<10
l0 <x<20
20 <x<30
30<x<40
40<x<50
50<x<60
(a) Calculate an estimate of the mean amount of money per student spent on booh.
(b) Use the information in the table above to find the values ofp, 4 and r in the
cumulative frequency table.

(c) Using a scale of 1 cm to represent 10 units on each axis, draw a cumulative


diagram.
(d) Use your diagram to estimate the median amount spent.

This cumulative frequency table shows the distribution of the masses of the children
attending a clinic.

0 < m 3lO-O

0 <m<20.0
0 < m <30.0

A<m<40.0
0<rn(50.0
0 <m<60.0

(a) Draw a cumulative frequency diagram. Use a horizontal scale of 1 cm = 10 kg and


a vertical scale of0.5 cm = 5 children.
(b) Estimate the median mass.
(c) How many children had a mass higher than the median mass?

Quartiles
In chapter L2youfound the range (the biggest value - the smallest value) to see how dispersed
various sets of data were. The range, however, is easily affected by outliers (extreme or unusual
values), so it is not always the best measure of how the data is spread out.

The data shown on a cumulative frequency curve can be divided into four equal groups
called quartiles to find a measure of spread called the interquartile range, which is more
representative than the range because it is not afected by extremes.

The cumulative frequency curve on the next page shows the marks obtained by 64 students in a
test. These are listed below:

r 48 students scored less than 15 marks. 15 marks is the upper quartile or third quartile Qr.
r 32 students scored less than 13 marks. 13 marks is the second quartile Qr, or median mark.
o 16 students scored less than 11 marks. 11 marks is the lower quartile or first quartile Q,.

Unit 5: Data handling


e number values are being
in this example to make it Marks scored by students
bto understand. Usually your 64 ,
will be estimates and they 60
lnvolve decimal f ractions. 56
52
tJpper 6,=3n
-"
quartile 4 48
o 44
finding the positions of ()
40 --a I

from a cumulative () 36 '-,


1
I

curve you do not use


-!
Second n Q, is the median mark
qPandf,(n+r)
() ., ^U2= 32
quartlle z^ Qz =13
ry
ttfrat you met for discrete
=
d 28
24 ll
20
dn drapter 12. lnstead you O Lower nUl- -nj 16
m_
_- n .3n and
quartile +
t2
4'2 -.4 8
4
0
9 10 11 t213 14 15 16 17 18
Mark

Ttte interqwn{til€ rsnge


The interquartile range (IQR) is the difference betrveen the upper and lower quartiles: Q, - Q,.
In effect, this is the range of the middle 50% of the scores, or the median of the upper half of the
values minus the median of the lower half of the values.

In the example above, the IQR = 15 - ll =4


Because the interquartile range does not use any extreme small or large values it is considered a
more reliable measure of spread than the range.

was used to compare


in chapter t Z. .(

Unit 5: Data handling 43.3


20 Histograms and frequency distribution diagrams

Using the data set in worked example 8:


to
The position of the 10th percentile on the cumulative frequency axis is P,o =
" f^t^ooo = roo
100
851]?00
The position of the 85th percentile on the cumulative frequency axis is Pr, : = aso
" 100

(Dont forget that you need to move right to the curve and down to the horizontal axis to find
the values ofthe percentiles.)

The percentile range is the difference between given percentiles. In the example above,
this is Pr, - P,o.

In chapter 12, percentiles were first introduced but only the 25th and 75th percentiles were used
to introduce the interquartile range. A question was posed at the start of section 12.5 on page 240
All those candidates above the 80th percentile will be offered an interview What does this
mean?' The following worked example shows you how to answer this question.

The, cr:rmiilativefrequenry curve: ghows the test results of 200 candidates who have
applied foi a, post at Fashklddler's. Only those whci scere above the SOth percentile will
bercalled {or an interview; What is the lowest score,that can, be,obtai.ned to receive an
interview letter?
Candidate test Siores
2A0 l1ltr:i
J-1 i lj
180
160'

140

t20
Cum-ulative
100
frequency
80
q0
'40
2A

20 30
Test score

8oo/o of 200 is 160.


So, the value of Pro is a test score of 35. (Read off the graph where the curve is 160)
Only those candidates who scored above 55 marks on the test will be called for
an interview.

Unit 5; Data handling


'" 2O nistograms and naiiidihs'

Exercise 20.4 The lengths of 32 metal rods were measured and recorded on this
curve. Use the graph to find an estimate for:

(a) the median


(b) Q,
(c) Q,
(d) the IQR
(e) the 40th percentile.

35

30

25

Cumulative
frequency

0 20 tt 40 4s
,."J,3,.-,"
This cumulative frequenry cuwe compares the results 120 students obtained on two maths

(a) For each paper, use the graph to find:


(i) the median mark
(ii) the IQR
(iii) the 60th percentile.
(b) what mark would you need to get to be above the 90th percentile on each paper?

120

110

100

90

80

70
CumulativeUO
trequency
50

40

30

20

10

0
0 1020 40 50 60 70 80
Marks

'436 Unit 5: Data handling


diagrams
20 Histograms and frequency distribution

5Thiscumulativefrequencycurveshowsthemassesof500l2-year-oldgirls(inkg).
Mass of 12-Year-old girls

400

300
Cumulative
freguencY
rOO

35 40
Mass (kg)

(a) Use the graPh to work out:


(i) tneleaian mass of the l2-year-olds
and 50kg'
iif il* many girls have a mass between 40
to go on an amuslment park childrens ride if the
(b) percentage of girls
' ' What "t-t ""1!T
upper mass limit for the ride is 51kg?

Thiscumulativefrequencytablegivesthespeedsof200carstravellingonthehighwayfrom
ctty'
Kuala Lumpur International Airport into the

60Ss<70
60Ss<80
60<s<90
60Ss<100
60<s< 110

603s<120
50Ss<130
60<s<140

to show this data' Use a scale of I cm


per 10 km/h
(a) Draw a cumulative frequency curve axis.
on the horizontal axis !.a. of 1 cm per 10 cars on the vertical
"rrJuthe median' Q, and Q' for this data'
(b) Ur" yoo, curve to estimate
(c) Estimate the IQR. cars were
is 120 km/h. what percentage of the
(d) The speed limit on this stretch of road
speeding?
Summary
Do you know the following? Are you able to ...?
o Histograms are specialised bar graphs used for o read and interpret histograms with equal intervals
displaying continuous and grouped data. o construct histograms with equal intervals
o There is no space between the bars of a histogram o interpret and construct histograms with unequal
because the horizontal scale is continuous. intervals
o \Arhen the class widths are equal the bars are equally . construct a table to find the frequency density of
wide and the vertical axis shows the frequency. diferent classes
r If the class widths are unequal, the bars are not equally r calculatecumulativefrequencies
wide and the vertical axis shows the frequency density.
o plot and draw a cumulative frequency curve
frequency per class interval
r Frequency density - o use a cumulative frequency curve to estimate the
class width
o find quartiles and calculate the interquartile range
o Cumulative frequency is a running total of the class
o estimate and interpret percentiles.
frequencies up to each upper class boundary.
o \Arhen cumulative frequencies are plotted they give a
cumulative frequency curve or ogive.
r The curve can be used to estimate the median value in
the data.
r The data can be divided into four equal groups called
quartiles. The interquartile range is the difference
between the upper and lower quartiles (Q, - Q,).
o Large masses of data can be divided into percentiles
which divide the data into 100 equal groups. They are
used to compare and rank measurements.

438 Unit 5: Data handling

You might also like