0% found this document useful (0 votes)
41 views32 pages

SM025 - Topic 6 - Student

The document discusses different types of data and methods for organizing and describing data statistically. It covers discrete and continuous data, ungrouped and grouped data, and how to construct and interpret stem-and-leaf diagrams, measures of central tendency including mean, median and mode, and measures of spread such as quartiles and percentiles.

Uploaded by

thisisacoralreef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views32 pages

SM025 - Topic 6 - Student

The document discusses different types of data and methods for organizing and describing data statistically. It covers discrete and continuous data, ungrouped and grouped data, and how to construct and interpret stem-and-leaf diagrams, measures of central tendency including mean, median and mode, and measures of spread such as quartiles and percentiles.

Uploaded by

thisisacoralreef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

SM025 | MATHEMATICS 2| 2023/2024

TOPIC 6: DATA DESCRIPTION

TOPIC : 6.0 DATA DESCRIPTION

SUBTOPIC : 6.1 Introduction to Data

LEARNING
OUTCOMES : At the end of this lesson, students should be able to:

(a) Identify the discrete and continuous data


(b) Identify ungrouped and grouped data
(c) Construct and interpret stem-and leaf diagrams

A. Identify the discrete and continuous data

Types of Data

Quantitative data Discrete data is a data that assume


countable values, examples:
- is one which the resulting a) The number of students in a
observations can be measured class.
because they possess a natural order or b) The number of cars sold on
ranking such as age, weight, height, any day at a car dealership.
marks c) The COVID-19 cases per day.

Qualitative data

- is one for which numerical Continuous data is a data that assume


measurement is not possible. An any numerical value over a certain
observation is made when an interval, examples:
individual is assigned to one of several a) The weight of students in a
mutually exclusive categories such as class.
colours, types of cars b) The time taken to complete an
examination.
c) Income of a family.

Example 1: Based on the following statements, determine either the data is discrete data or continuous data.
(a) The travel duration from Kuala Pilah to Bahau.
(b) The number of clothes sold by a charity shop in Senawang.
(c) The height of kids born in January.
(d) The number of cancer cases reported in Malaysia per day.
(e) The length of the bundle of firewood.

1
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

B. Ungrouped and Grouped Data

Raw data can be represented in Ungrouped Data and Grouped Data.

a) Grouped data is grouped in interval, are categorized into mutually exclusive intervals, can be presented in
frequency distribution table, histogram, polygon, ogive.

Example

Height (cm) 150-155 155-160 160-165 165-170


Frequency 2 8 6 5

b) Ungrouped Data are listed as a sequence or in the form of a frequency table but without the use of intervals.

Example

Number of children 0 1 2 3 4
Number of families 4 6 7 2 1

C. Construct and interpret stem-and leaf diagrams

Construct and interpret stem-and- leaf diagrams.


A stem and leaf diagram will contain all of your data in all of its detail. One can look at a stem and leaf and extract
every data value in your dataset. The utility of these diagrams is that they make it easier to see the range of values
in your dataset and the relative frequency of each. The digit(s) in the greatest place value(s) of the data values are
the stems. The digits in the next greatest place values are the leaves. For example, if all the data are two-digit
numbers, the number in the tens place would be used for the stem. The number in the ones place would be used
for the leaf.

Example 2

Construct a stem-and-leaf diagram for the data below:

12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41

Solution
The "stem" is the left-hand column which contains the tens digits. The "leaves" are the lists in the right-hand
column, showing all the ones digits for each of the tens, twenties, thirties, and forties.
Stem Leaf
1 2 3
2 1 7
3 3 4 5 7
4 0 0 1
Key: 1| 2 means 12
2
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 3

Complete a stem-and-leaf plot for the following list of grades on a recent test:

73, 42, 67, 78, 99, 84, 91, 82, 86, 94

Exercise

1. The marks of 30 candidates in an examination are given below. Construct a stem-and-leaf diagram for the
marks:

62 21 4 26 7 38 64 12 38 45
33 55 62 48 49 7 9 41 21 30
3 25 67 8 18 43 72 23 5 17

Answer
Marks

Stem Leaf

0 3 4 5 7 7 8 9

1 2 7 8

2 1 1 3 5 6

3 0 3 8 8

4 1 3 5 8 9

5 5

6 2 2 4 7

7 2

Key: 0| 3 means 3

3
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

SUBTOPIC : 6.2 Measures of Location

LEARNING
OUTCOMES : At the end of lesson students should be able to:

(a) To find and interpret the mean, mode, median, quartiles and
percentiles for ungrouped data.
(b) To construct and interpret box-and-whisker plots for ungrouped data.
(c) Find and interpret the mean, mode, median, quartiles and percentiles
for grouped data.

A. Mean, Median, Mode, Quartiles and Percentiles for Ungrouped Data.

Mean Median

Average of a set data x1, x2, x3,..., xn is Step 1: Arrange the data in ascending order.
written as x and defined as
Step 2:

sum of all data a) When the number of data (n) is odd, the median
x
 n 1
th
number of data
is the   observation.
 2 
x1  x2  x3  ...  xn
x b) When the number of data (n) is even, the median
n
is the mean of the two middle values.

x
 x (for sequence)
n

OR

x
 fx (for table)
f

Mode The mode of a set of data is the value that occurs most frequently.

Mean
Mean of a set data x1 , x2 , x3 ,..., xn is written as x and defined as

sum of all data


x
number of data
x1  x2  x3  ...  xn
x
n


x
n

4
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 1
(a) Find the mean of a set of numbers
3, 5, 7, 4, 5, 9, 6
(b) Find the mean of a set of data
Number of Male Children 0 1 2 3 4 5
Frequency 2 5 7 3 2 1

(c) Find the mean of a set of numbers


2, 2, 2. 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6
Solution
a)

b)

c)

Median

The median is the middle value when a set of data is arranged in order of magnitude then choose the middle point.

Example 2

Find the median for the following set of data.

180 186 191 201 209 219 220

Solution

180 186 191 201 209 219 220

median
Median = 201

5
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

For a set of data x1 , x2 , x3 ,..., xn arranged in order of magnitude, there are two cases.
th
n 1


a) When the number of data (n) is odd, the median is the   observation.

2 

b) When the number of data (n) is even, the median is the mean of the two middle values.

Example 3
Find the median for the following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.56, 2.71, 5.48, 8.61, 4.35, 6.22
Solution

 n  1
th

a) Median =   observation
 2 
b)

Mode

The mode of a set of data is the value that occurs most frequently.

Example 4

Find the mode for the following set of data

a) 5, 2, 3, 3, 5, 4, 28, 5

b) 2, 3, 5, 8, 10

c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5

Solution

(a) 2, 3, 3, 4, 5, 5, 5, 28

Mode =

(b) 2, 3, 5, 8, 10

Mode =

(c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.5, 0.7, 0.7, 0.7

Mode =
6
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Quartiles Percentiles
Quartiles divide a set of data which are For a set of data which are arranged in
arranged in ascending order into 4 equal ascending order percentiles divides the set into
parts. 100 equals parts.

Step 1: Arrange the data in ascending order. Step 1: Arrange the data in ascending order.

k k
Step 2: Find r n Step 2: Find s  n
4 100
Where n  number of observations Where n  number of observations
k  quartile for Qk k  percentile
k = 1, 2, 3 k = 1, 2, 3, …, 99.

Step 3: Step 3:

1 1
  xr  xr 1  , if r is an integer   xs  xs 1  , if s is an integer
Qk   2 Ps   2
 xr  , if r is not an integer  x s , if s is not an integer
 

 r   the nearest integer larger than r s   the nearest integer larger than s
(round up to the nearest integer) (round up to the nearest integer)

Notes:

- Q1  P25 is 1st quartile


- Q2  P50 is also called median
- Q3  P75 is 3rd quartile
- Interquartile range = Q3  Q1

7
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 5
Find the median, first quartile ( Q1 ) and third quartile ( Q3 ) for the following sets of data
(a) 21, 24, 17, 28, 36, 20, 32
( b ) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6

Solution
(a) The data arranged in ascending order :
17, 20, 21, 24, 28, 32, 36

( b ) The data arranged in ascending order :


2.7, 3.5, 4.3, 5.4, 6.2, 7.6, 8.6, 9.9

8
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 6

The following data shows the number of books borrowed daily from a library in February. Find P40 .
(a) 60, 63, 77, 50, 66, 71, 73, 89, 70, 68
(b) 75, 66, 77, 73, 89, 80, 78, 55, 67

Solution

(a) First, arrange the data in an ascending order

50, 60, 63, 66, 68, 70, 71, 73, 77, 89

Pk  X k n  10,k  40
 n
100

P40  P40
10 
100

(b) 55, 66, 67, 73, 75, 77, 78, 80, 89

Pk  X k n  9,k  40
 n
100

9
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

B. Construct and interpret box-and-whisker plots for ungrouped data.


Box-and-Whisker Plots

A Box-and-Whisker Plots also called a Boxplot, is based on the five number summary and can be used to provide
a graphical display of the center and variation of a data set.

To construct a Boxplot

Step 1: Determine the five number summary.

Step 2: Calculate the values of upper and lower inner fence to determine whether the data has
outlier.
Upper inner fence = Q3 + 1.5 (Q3 – Q1)
Lower inner fence = Q1 - 1.5 (Q3 – Q1)

Step 3: Draw a horizontal axis with a suitable scale and locate the number obtained in step 1 can be located.
Above this axis, mark all the five number summary with vertical lines.

Step 4: Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum
with lines.

Note: five number summary – min, Q1, Q2, Q3 and max.

Lower inner fence Upper inner fence

min max
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The data lies within the upper and lower inner fence, so the data has no outlier.

Lower inner fence Upper inner fence

 outlier
min
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The observation that lies outside fence is known as outlier. So, we have to take the nearest value maximum that
inside the fence and mark the outlier with the dense circle or cross.

10
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Shape of data distribution – symmetry and skewness

Symmetrical distribution – The ‘whiskers’ are the same length and the median is in the centre of the box.

Q1 Q2 Q3

Positively skewed distribution – the left ‘whiskers’ is shorter than the right ‘whiskers’ and the median is nearer
to the Q1.

Q1 Q2 Q3

Negatively skewed distribution – the left ‘whiskers’ is longer than the right ‘whiskers’ and the median is nearer
to the Q3.

Q1 Q2 Q3

Example 7

Data :
40, 32, 61, 52, 65, 68, 41, 61, 70, 66, 57, 55, 45,
51, 62, 69, 31, 50, 72, 66, 41, 54, 65, 79, 66
(a) Find the first, second and third quartile, upper and lower inner fence.
(b) Construct a box and whisker plot for the above data.

11
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Solution
(a) Arrange data in ascending order
31, 32, 40, 41, 41, 45, 50, 51, 52, 54, 55, 57, 61,
61, 62, 65, 65, 66, 66, 66, 68, 69, 70, 72, 79
Number of observation, n = 25, min = 31 , max = 79

(b)

C. Mean, mode and median for grouped data.

Mean

If a set of grouped data given in frequency distribution, for example in the form of class intervals, the mean is
defined as :
f x  f 2 x 2  ...  f k x k
x 1 1
f1  f 2  ...  f k
k

fx i i
 i 1
k

f
i 1
i


 fx
f
Where xi is the midpoint of the i th class and f i is the corresponding frequency.
12
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 1

Find the midpoint of each class intervals and using formulae, calculate the mean of a life content of 40 batteries.

Time (years) Number of batteries

1.5 - 1.9 2
2.0 - 2.4 1
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3

Solution

Time (years) Midpoint f i xi


Number of batteries ( xi )

1.5  1.9
1.5 - 1.9 2  1.7
2
2.0  2.4
2.0 - 2.4 1  2.2
2
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3

Sum of frequencies  fi  40  f i xi 

Mean 

13
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Mode

Mode can be calculated using formulae:


 d1 
Mode  LB   C
 d1  d 2 
Where;
LB = lower class boundary of mode class
d1 = the different between the mode class
frequency and the previous class frequencies
d2 = the different between mode class frequency
and the class frequency after the mode class
frequency.
C = class width
Class Boundaries
Let say x = 0.5

Lower boundary = Lower limit –x

Upper boundary = Upper limit + x

Example 2

Find the mode of frequency distribution given below :

Class Interval Frequency

15 - 19 1

20 - 24 4
25 - 29 22

30 - 34 35

35 - 39 20

40 - 44 8

14
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Solution

Class Interval Frequency

15 – 19 1

20 – 24 4
25 – 29 22

30 – 34 35 Class interval containing the mode


class of frequency
35 – 39 20

40 – 44 8

Thus we can calculate mode class,


Mode =

Median

Median of frequency distribution cannot be counted like the ungrouped data because the data has been grouped
in the form of classes. Median is the value for which 50% of the observations lie either side of it when arranged
in order of magnitude.
th
n
The median class should be determined first before calculating the median. The median lie at   observations
2
by referring to the cumulative frequency.
So, we will get an estimated value of median by formulae.
n 
  Fk 1 
Median  Lk   2 C
 fk 
 
 
Where;
Lk = is the lower class boundary of class median
n = is the number of data or the sum of frequency

Fk 1 = cumulative frequency before class median


C= class width
f k = frequency of class median

15
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example3 (a)

Find the cumulative frequency and calculate the median.

Class Interval Frequency


1 - 5 1
6 - 10 3
11 - 15 5
16 - 20 7
21 - 25 13
26 - 30 9
31 - 35 7
36 - 40 3
41 - 45 2
Solution
Class Interval Frequency Cumulative frequency
1 - 5 1 1
6 - 10 3 4
11 - 15 5 9
16 - 20 7 16
21 - 25 13 29
26 - 30 9 38
31 - 35 7 45
36 - 40 3 48
41 - 45 2 50

th
 50 
Median    observation = 25 th observation
 2 

We can determine observation of median using cumulative frequency

th
 50 
Median    observation = 25 th observation
 2 
Then calculate median,

Median 
16
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 3 (b)

Class Interval Frequency


3 x 8 2
8  x  13 4
13  x  18 7
18  x  23 8
23  x  28 6
For the above data, calculate median

N 27
Median ,   13.5 , median class is 18  x  23
2 2

Median, m 

Quartiles

For grouped data, the kth quartile,

k  
  4  n  Fk 1 
Qk= Lk      ck ; k = 1, 2, 3.
 fk 
 
 

where Lk – lower boundary of the class where Qk lies.


n – total number of observations.
Fk-1 – cumulative frequency before the Qkclass.
fk – frequency of the class where Qk lies.
c – size of the class where Qk lies.

Percentiles
For grouped data, the kth percentile,

 k  
  100  n  Fk 1 
Pk= Lk      ck ; k = 1, 2, 3, …, 99.
 fk 
 
 

where Lk – lower boundary of the class where Pk lies.


n – total number of observations.
Fk-1 – cumulative frequency before the Pk class.
fk – frequency of the class where Pk lies.
c – size of the class where Pk lies.
17
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Note:
i. The 25 percentile is called the 1st quartile, Q1.
ii. The median is the 50 percentile, also are called the second quartile, Q2.
iii. The 75 percentile is called the third quartile, Q3.
iv. Interquartile range is the range between the 1st quartile and third quartile (Q3 – Q1).

Example 4
For the frequency distribution given below,

Class interval Frequency


20 – 29 4
30 – 39 11
40 – 49 20
50 – 59 45
60 – 69 25
70 – 79 12
80 – 89 3
Find the
(a) median
(b) first quartile
(c) third quartile
(d) Interquartile range
(e) 10th and 70th percentile.

Solution

Class interval Class boundary Frequency, fi Cumulative frequency, F


20 – 29 19.5 – 29.5 4 4
30 – 39 29.5 – 39.5 11 15
40 – 49 39.5 – 49.5 20 35
50 – 59 49.5 – 59.5 45 80
60 – 69 59.5 – 69.5 25 105
70 – 79 69.5 – 79.5 12 117
80 – 89 79.5 – 89.5 3 120
Total 120

18
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

n 
 2F 
(a) Median = Lm   c
 fm 
 
120 th
 Median = observation = x(60)
2

The class containing the median is fourth class.

Median =

k  
  4  n  Fk 1 
(b) Quartile, Qk = Lk      ck
 fk 
 
 
1
Q1 = (120) th observation
4

Thus, Q1 is in the third class with boundaries (39.5 – 49.5)

Q1 =

3
(c) Q3 = (120) th observation
4

Thus, Q3 is in the fifth class with boundaries (59.5 – 69.5)


Q3 =

(d) Interquartile range = Q3 – Q1 =

 k  
  100  n  Fk 1 
(e) Percentile, Pk= Lk    ck ;
 fk 
 
 
10
 10th percentile, P10 = (120) th observation = x(12)
100

Thus, P10 is in the second class with boundaries (29.5 – 39.5)


P10 =

19
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

70
 70th percentile, P70 = (120) th observation = x(84)
100
Thus, P70 is in the fifth class with boundaries (59.5 – 69.5)
P70 =

SUBTOPIC : 6.3 Measures of Dispersion

LEARNING
OUTCOMES : At the end of this lesson, students are able:

a) To find and interpret variance and standard deviation for ungrouped data.
b) To find and interpret the variance and standard deviation for grouped data.
c) Find and interpret the Pearson’s Coefficient of Skewness.

Variance and Standard deviation are most useful and widely used to measure the dispersion.
Standard deviation measures how spread out the values in a data set are.

i. the data points are all close to the mean, then the standard deviation is close to zero.
ii. the many data points are far from the mean, then the standard deviation is far from to zero.
iii. the data values are equal to the mean, then the standard deviation is zero.

Note: Large SD – data more disperse and less consistent.


Small SD – data less disperse and consistent.

A. Variance and standard deviation for ungrouped data

For ungrouped data, the variance


 x  2

x 2

n
variance, s 2  ; standard deviation, s  s 2
n 1

20
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 1

Find the mean, variance and standard deviation for the data below.

2, 7, 10, 9, 2, 5, 16

Solution

X x2
2 4
7
10
9
2
5
16
x  x 2

mean 
s2 
Variance, s2 =

Standard deviation, s =

Example 2

Given the two data as listed below:

Data I : 8, 18, 9, 10, 12, 16, 13, 15, 16, 13, 13

Data II : 11, 13, 13, 1, 2, 23, 13, 14, 15, 18, 20

Find the mean and standard deviation for the above data and interpret the values obtained.

Solution

143
Data I: mean   13
11
 x  143
x 2
 8 2  182  9 2  102  122  162  132  152  162  132  1957

21
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

,s 
2

Mean data I =
Standard deviation data I =

143
Data II: mean   13
11

 x  143 ,

x 2
 112  132  132  12  2 2  232  132  142  152  182  202  2307

s2 

Mean data II =
Standard deviation data II =

In conclusion from the value of standard deviation, data II (greater standard deviation) is more disperse and less
consistent compare to data I

Example 3
The data below shows the marks obtained by Nik and Fizz in five tests:

Nik’s marks : 80, 80, 80, 80, 85

Fizz’s marks : 69, 78, 80, 80, 98

Which student shows a better overall performance?

Solution

80  80  80  80  85
Nik’s marks: mean   81
5
 x  405 ,  x 2  802  802  802  802  852  32825
4052 4052
32825  32825 
s 
2 5 5 s 2
 5 5
4 4

Standard deviation = 2.24

69  78  80  80  98
Fizz’s marks : mean   81
5
 x  405 ,  x 2  692  782  802  802  982  33249
22
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

4052
33249 
s2  5  111
4
Standard deviation = 10.55

By comparing the variance and standard deviation, Fizz’s marks show a greater dispersion which means that
Fizz’s marks are less consistent. Thus, it can be concluded that Nik shows a better overall performance.

Example 4

The following is the systolic blood pressure, in mm Hg, of 10 patients in a hospital.

165 135 151 155 158 146 149 124 162 173

a) Find the mean and the standard deviation of the systolic blood pressure of the 10 patients.

b) Find the number of patients whose systolic blood pressures exceed one standard deviation above or below the
mean.

Solution

165+135+151+155+158+146+149+124+162+173
a) The mean, x =  151.80
10
 x  1518 ,  x2  232306
15182
232306 
s2  10  208.1778
9

Standard deviation, s  2208.1778  14.43

b) One standard deviation from the mean


 151.80  14.43,151.80  14.43  137.37,166.23

 range of 137.37,166.23

Number of patients with systolic blood pressure outside this range is 3

23
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Variance and standard deviation for grouped data

 fx  2

 fx 2

n
Variance, s  2
Standard deviation, s  s 2
n 1

Example 5

Find the mean, variance and standard deviation for the data below.

Marks Number of students


0 ≤ x < 20 9
20 ≤ x < 40 29
40 ≤ x < 60 42
60 ≤ x < 80 26
80 ≤ x < 100 14

Solution

Marks Midpoint, x f fx fx2


0 ≤ x < 20 10 9 90 900
20 ≤ x < 40 20 29 870 26100
40 ≤ x < 60 42
60 ≤ x < 80 26
80 ≤ x < 100 14
n = 120  fx   fx 2

Mean, x 

s2 

s=

24
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 6

Wages per hour (RM) paid to temporary workers at production and marketing departments of a factory is as
shown in the following table.

Numbers of temporary workers


Wages
Production department Marketing department
8 ≤ x < 10 15 8
10 ≤ x < 12 18 31
12 ≤ x < 14 35 58
14 ≤ x < 16 25 32
16 ≤ x < 20 12 6

Find estimates for median, mean and standard deviation for wages per hour for all the temporary workers in the
factory.

Solution

Combining the two departments we have the frequency table as follows;

Numbers of
Wages x fx f x2
Temporary workers

8 ≤ x < 10 9 23 207 1864


10 ≤ x < 12 11 49 539 5929
12 ≤ x < 14 13 93 1209 15717
14 ≤ x < 16 15 57 855 12825
16 ≤ x < 20 18 18 324 5832

Total 240 3134 42166

N = 240

Median

Mean

Variance,

Standard deviation,
25
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 7

The frequency distribution table shows the masses of loaves of bread produced by a bakery.

Mass (g) 420 – 424 425 – 429 430 – 434 435 – 439 440 – 444
Frequency 16 24 25 18 17

a) Find the standard deviation, correct to two decimal places.


b) The bakery allows only loaves of bread each with a mass of within one standard deviation from the mean to
be sold in the market. Find the interval of mass of the loaves of bread allowed to be sold.

Solution

Mass (g) /
Class Midpoint, x f fx fx2
boundary
419.5 – 424.5 422 16 6752 2849344
424.5 – 429.5 427 24 10248 4375896
429.5 – 434.5 432 25 10800 4665600
434.5 – 439.5 437 18 7866 3437442
439.5 – 444.5 442 17 7514 3321188
n = 100  fx  43180  fx 2
 18649470

 fx  2
431802
 fx 2

N
18649470
100  43.8989
a) Variance, s 2  
N 1 99

b) Standard deviation, s  43.8989  6.63g

c) Mean, x 
 fx  43180  431.80 g
 f 100
One standard deviation from the mean
  431.80  6.63, 431.80  6.63   425.17, 438.43

 range of  425.17 g , 438.43g 

Thus, the interval of mass of the loaves of bread allowed to be sold in the market is between 425.17g and
438.43g.

26
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 8

The frequency distribution table shows the hourly wages of workers in a factory.

Wage (RM) 5–7 8 – 10 11 – 13 14 – 16 17 – 19


Frequency 9 16 11 8 6

a) Find the standard deviation, correct to three decimal places.


b) If the manager of the factory decides to increase the wage of each worker by 20%, find the new standard
deviation.

Solution

Wage (RM) /
Midpoint, x f fx fx2
Class boundary
4.5 – 7.5 6 9 54 324
7.5 – 10.5 9 16 144 1296
10.5 – 13.5 12 11 132 1584
13.5 – 16.5 15 8 120 1800
16.5 – 19.5 18 6 108 1944
n = 50  fx  558  fx 2
 6948

  fx 
2
5582
 fx 2

N
6948 
50  14.7086
a) Variance, s  
2

N 1 49

Standard deviation, s  14.7086  RM 3.835 (3 d . p)

b) After increase of 20% on the wages, that is the sum of each midpoint with 20% of each midpoint,

Wage (RM) /
Midpoint, x f fx fx2
Class boundary
4.5 – 7.5 7.2 9 64.8 466.56
7.5 – 10.5 10.8 16 172.8 1866.24
10.5 – 13.5 14.4 11 158.4 2280.96
13.5 – 16.5 18 8 144 2592
16.5 – 19.5 21.6 6 129.6 2799.36
n = 50  fx  669.6  fx 2
 10005.12

27
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION

  fx 
2
669.62
 fx 2

N
10005.12 
50  21.1803
c) New Varians, s  
2

N 1 49

New Standard deviation, s  21.1803  RM 4.602 (3 d . p)

C. Find and interpret the Pearson’s Coefficient of Skewness

NOTE : When the Pearson’s coefficient is very close to 0 (negative/positive), the distribution of data is almost symmetrical.

Highly Moderately Moderately Highly


Almost symmetrical /
skewed skewed to the skewed to the skewed to
slightly skewed
to the left left right the right

28
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 1

Given the following sorted data. Find the Pearson’s coefficient of skewness.

1.2, 1.5, 1.9, 2.4, 2.4, 2.5, 2.6, 3.0, 3.5, 3.8

Solution

Mean = 2.48

median = 2.45

mode = 2.4

s = 0.8176

Sk =

or Sk =

Example 2

The management of a large hospital recorded the age of a random sample of 160 patients. The
results of this survey are shown in the following table

Age Number of Patients


20 ≤ x < 25 2
25 ≤ x < 30 3
30 ≤ x < 35 4
35 ≤ x < 40 7
40 ≤ x < 45 9
45 ≤ x < 50 16
50 ≤ x < 55 24
55 ≤ x < 60 32
60 ≤ x < 65 30
65 ≤ x < 70 20
70 ≤ x < 75 8
75 ≤ x < 80 3
80 ≤ x < 85 2

29
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION

For the sample calculate :


1. the median age
2. the mean age
4. the mode age
5. the standard deviation of the age patients

Then calculate the value of Pearson’s coefficient.

Solution.

th
 f 
a) Median = falls under  2  observation
 

th
 160 
= falls under   observation
 2 

= observation of 80th item which lies in (55 - 60) class interval.

 f 
  Fmed 
 med  Lmed   2 c
 f med 
 

b) mean =

c) mode =

d) s=

or Sk 

30
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION

Example 3

The following table gives the frequency distribution of 291 workers of a factory according to their
average monthly income in 1995 - 2005. Find the Pearson’s coefficient of skewness. Then give
your comment on the value obtain.

Income group (RM) No.of workers

300 - 500 1

500-700 16

700-900 39

900-1100 58

1100-1300 60

1300-1500 46

1500-1700 22

1700-1900 15

1900-2100 15

2100-2300 9

2300 - 2500 10
Solution
Income group f c.f.

300 - 500 1 1

500 - 700 16 17

700 – 900 39 56

900 - 1100 58 114

1100 – 1300 60 174

1300 - 1500 46 220

1500 – 1700 22 242

31
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION

1700 - 1900 15 257

1900 – 2100 15 252

2100 - 2300 9 281

2300 - 2500 10 291

n =  x = 291
th
 f 
Median = falls under  2  observation
 
th
 291
= falls under   observation
 2 
= observation of 146th item which lies in (1100-1300) class interval.

 f 
  Fmed 
 med  Lmed   2 c
 f med 
 
 291 
 2  114 
 1100    200 = RM1136.20
 174 
 

 2 
Mode  1100   200 = RM1125
 2  14 

Mean = RM1280.07

Standard deviation = RM 450.07

3(1280.07  1136.20)
sk 
450.07
= 0.959
or
1280.27  1125
sk  = 0.345
450.07
Since the Pearson’s coefficient is positive, the distribution is slightly skewed to the right.
32

You might also like