SM025 - Topic 6 - Student
SM025 - Topic 6 - Student
LEARNING
OUTCOMES : At the end of this lesson, students should be able to:
Types of Data
Qualitative data
Example 1: Based on the following statements, determine either the data is discrete data or continuous data.
(a) The travel duration from Kuala Pilah to Bahau.
(b) The number of clothes sold by a charity shop in Senawang.
(c) The height of kids born in January.
(d) The number of cancer cases reported in Malaysia per day.
(e) The length of the bundle of firewood.
1
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
a) Grouped data is grouped in interval, are categorized into mutually exclusive intervals, can be presented in
frequency distribution table, histogram, polygon, ogive.
Example
b) Ungrouped Data are listed as a sequence or in the form of a frequency table but without the use of intervals.
Example
Number of children 0 1 2 3 4
Number of families 4 6 7 2 1
Example 2
12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41
Solution
The "stem" is the left-hand column which contains the tens digits. The "leaves" are the lists in the right-hand
column, showing all the ones digits for each of the tens, twenties, thirties, and forties.
Stem Leaf
1 2 3
2 1 7
3 3 4 5 7
4 0 0 1
Key: 1| 2 means 12
2
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 3
Complete a stem-and-leaf plot for the following list of grades on a recent test:
Exercise
1. The marks of 30 candidates in an examination are given below. Construct a stem-and-leaf diagram for the
marks:
62 21 4 26 7 38 64 12 38 45
33 55 62 48 49 7 9 41 21 30
3 25 67 8 18 43 72 23 5 17
Answer
Marks
Stem Leaf
0 3 4 5 7 7 8 9
1 2 7 8
2 1 1 3 5 6
3 0 3 8 8
4 1 3 5 8 9
5 5
6 2 2 4 7
7 2
Key: 0| 3 means 3
3
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
LEARNING
OUTCOMES : At the end of lesson students should be able to:
(a) To find and interpret the mean, mode, median, quartiles and
percentiles for ungrouped data.
(b) To construct and interpret box-and-whisker plots for ungrouped data.
(c) Find and interpret the mean, mode, median, quartiles and percentiles
for grouped data.
Mean Median
Average of a set data x1, x2, x3,..., xn is Step 1: Arrange the data in ascending order.
written as x and defined as
Step 2:
sum of all data a) When the number of data (n) is odd, the median
x
n 1
th
number of data
is the observation.
2
x1 x2 x3 ... xn
x b) When the number of data (n) is even, the median
n
is the mean of the two middle values.
x
x (for sequence)
n
OR
x
fx (for table)
f
Mode The mode of a set of data is the value that occurs most frequently.
Mean
Mean of a set data x1 , x2 , x3 ,..., xn is written as x and defined as
x
n
4
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 1
(a) Find the mean of a set of numbers
3, 5, 7, 4, 5, 9, 6
(b) Find the mean of a set of data
Number of Male Children 0 1 2 3 4 5
Frequency 2 5 7 3 2 1
b)
c)
Median
The median is the middle value when a set of data is arranged in order of magnitude then choose the middle point.
Example 2
Solution
median
Median = 201
5
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
For a set of data x1 , x2 , x3 ,..., xn arranged in order of magnitude, there are two cases.
th
n 1
a) When the number of data (n) is odd, the median is the observation.
2
b) When the number of data (n) is even, the median is the mean of the two middle values.
Example 3
Find the median for the following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.56, 2.71, 5.48, 8.61, 4.35, 6.22
Solution
n 1
th
a) Median = observation
2
b)
Mode
The mode of a set of data is the value that occurs most frequently.
Example 4
a) 5, 2, 3, 3, 5, 4, 28, 5
b) 2, 3, 5, 8, 10
Solution
(a) 2, 3, 3, 4, 5, 5, 5, 28
Mode =
(b) 2, 3, 5, 8, 10
Mode =
(c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.5, 0.7, 0.7, 0.7
Mode =
6
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Quartiles Percentiles
Quartiles divide a set of data which are For a set of data which are arranged in
arranged in ascending order into 4 equal ascending order percentiles divides the set into
parts. 100 equals parts.
Step 1: Arrange the data in ascending order. Step 1: Arrange the data in ascending order.
k k
Step 2: Find r n Step 2: Find s n
4 100
Where n number of observations Where n number of observations
k quartile for Qk k percentile
k = 1, 2, 3 k = 1, 2, 3, …, 99.
Step 3: Step 3:
1 1
xr xr 1 , if r is an integer xs xs 1 , if s is an integer
Qk 2 Ps 2
xr , if r is not an integer x s , if s is not an integer
r the nearest integer larger than r s the nearest integer larger than s
(round up to the nearest integer) (round up to the nearest integer)
Notes:
7
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 5
Find the median, first quartile ( Q1 ) and third quartile ( Q3 ) for the following sets of data
(a) 21, 24, 17, 28, 36, 20, 32
( b ) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6
Solution
(a) The data arranged in ascending order :
17, 20, 21, 24, 28, 32, 36
8
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 6
The following data shows the number of books borrowed daily from a library in February. Find P40 .
(a) 60, 63, 77, 50, 66, 71, 73, 89, 70, 68
(b) 75, 66, 77, 73, 89, 80, 78, 55, 67
Solution
Pk X k n 10,k 40
n
100
P40 P40
10
100
Pk X k n 9,k 40
n
100
9
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
A Box-and-Whisker Plots also called a Boxplot, is based on the five number summary and can be used to provide
a graphical display of the center and variation of a data set.
To construct a Boxplot
Step 2: Calculate the values of upper and lower inner fence to determine whether the data has
outlier.
Upper inner fence = Q3 + 1.5 (Q3 – Q1)
Lower inner fence = Q1 - 1.5 (Q3 – Q1)
Step 3: Draw a horizontal axis with a suitable scale and locate the number obtained in step 1 can be located.
Above this axis, mark all the five number summary with vertical lines.
Step 4: Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum
with lines.
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
The data lies within the upper and lower inner fence, so the data has no outlier.
outlier
min
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
The observation that lies outside fence is known as outlier. So, we have to take the nearest value maximum that
inside the fence and mark the outlier with the dense circle or cross.
10
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Symmetrical distribution – The ‘whiskers’ are the same length and the median is in the centre of the box.
Q1 Q2 Q3
Positively skewed distribution – the left ‘whiskers’ is shorter than the right ‘whiskers’ and the median is nearer
to the Q1.
Q1 Q2 Q3
Negatively skewed distribution – the left ‘whiskers’ is longer than the right ‘whiskers’ and the median is nearer
to the Q3.
Q1 Q2 Q3
Example 7
Data :
40, 32, 61, 52, 65, 68, 41, 61, 70, 66, 57, 55, 45,
51, 62, 69, 31, 50, 72, 66, 41, 54, 65, 79, 66
(a) Find the first, second and third quartile, upper and lower inner fence.
(b) Construct a box and whisker plot for the above data.
11
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Solution
(a) Arrange data in ascending order
31, 32, 40, 41, 41, 45, 50, 51, 52, 54, 55, 57, 61,
61, 62, 65, 65, 66, 66, 66, 68, 69, 70, 72, 79
Number of observation, n = 25, min = 31 , max = 79
(b)
Mean
If a set of grouped data given in frequency distribution, for example in the form of class intervals, the mean is
defined as :
f x f 2 x 2 ... f k x k
x 1 1
f1 f 2 ... f k
k
fx i i
i 1
k
f
i 1
i
fx
f
Where xi is the midpoint of the i th class and f i is the corresponding frequency.
12
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 1
Find the midpoint of each class intervals and using formulae, calculate the mean of a life content of 40 batteries.
1.5 - 1.9 2
2.0 - 2.4 1
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3
Solution
1.5 1.9
1.5 - 1.9 2 1.7
2
2.0 2.4
2.0 - 2.4 1 2.2
2
2.5 - 2.9 4
3.0 - 3.4 15
3.5 - 3.9 10
4.0 - 4.4 5
4.5 - 4.9 3
Sum of frequencies fi 40 f i xi
Mean
13
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Mode
Example 2
15 - 19 1
20 - 24 4
25 - 29 22
30 - 34 35
35 - 39 20
40 - 44 8
14
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Solution
15 – 19 1
20 – 24 4
25 – 29 22
40 – 44 8
Median
Median of frequency distribution cannot be counted like the ungrouped data because the data has been grouped
in the form of classes. Median is the value for which 50% of the observations lie either side of it when arranged
in order of magnitude.
th
n
The median class should be determined first before calculating the median. The median lie at observations
2
by referring to the cumulative frequency.
So, we will get an estimated value of median by formulae.
n
Fk 1
Median Lk 2 C
fk
Where;
Lk = is the lower class boundary of class median
n = is the number of data or the sum of frequency
15
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example3 (a)
th
50
Median observation = 25 th observation
2
th
50
Median observation = 25 th observation
2
Then calculate median,
Median
16
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 3 (b)
N 27
Median , 13.5 , median class is 18 x 23
2 2
Median, m
Quartiles
k
4 n Fk 1
Qk= Lk ck ; k = 1, 2, 3.
fk
Percentiles
For grouped data, the kth percentile,
k
100 n Fk 1
Pk= Lk ck ; k = 1, 2, 3, …, 99.
fk
Note:
i. The 25 percentile is called the 1st quartile, Q1.
ii. The median is the 50 percentile, also are called the second quartile, Q2.
iii. The 75 percentile is called the third quartile, Q3.
iv. Interquartile range is the range between the 1st quartile and third quartile (Q3 – Q1).
Example 4
For the frequency distribution given below,
Solution
18
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
n
2F
(a) Median = Lm c
fm
120 th
Median = observation = x(60)
2
Median =
k
4 n Fk 1
(b) Quartile, Qk = Lk ck
fk
1
Q1 = (120) th observation
4
Q1 =
3
(c) Q3 = (120) th observation
4
k
100 n Fk 1
(e) Percentile, Pk= Lk ck ;
fk
10
10th percentile, P10 = (120) th observation = x(12)
100
19
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
70
70th percentile, P70 = (120) th observation = x(84)
100
Thus, P70 is in the fifth class with boundaries (59.5 – 69.5)
P70 =
LEARNING
OUTCOMES : At the end of this lesson, students are able:
a) To find and interpret variance and standard deviation for ungrouped data.
b) To find and interpret the variance and standard deviation for grouped data.
c) Find and interpret the Pearson’s Coefficient of Skewness.
Variance and Standard deviation are most useful and widely used to measure the dispersion.
Standard deviation measures how spread out the values in a data set are.
i. the data points are all close to the mean, then the standard deviation is close to zero.
ii. the many data points are far from the mean, then the standard deviation is far from to zero.
iii. the data values are equal to the mean, then the standard deviation is zero.
x 2
n
variance, s 2 ; standard deviation, s s 2
n 1
20
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 1
Find the mean, variance and standard deviation for the data below.
2, 7, 10, 9, 2, 5, 16
Solution
X x2
2 4
7
10
9
2
5
16
x x 2
mean
s2
Variance, s2 =
Standard deviation, s =
Example 2
Find the mean and standard deviation for the above data and interpret the values obtained.
Solution
143
Data I: mean 13
11
x 143
x 2
8 2 182 9 2 102 122 162 132 152 162 132 1957
21
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
,s
2
Mean data I =
Standard deviation data I =
143
Data II: mean 13
11
x 143 ,
x 2
112 132 132 12 2 2 232 132 142 152 182 202 2307
s2
Mean data II =
Standard deviation data II =
In conclusion from the value of standard deviation, data II (greater standard deviation) is more disperse and less
consistent compare to data I
Example 3
The data below shows the marks obtained by Nik and Fizz in five tests:
Solution
80 80 80 80 85
Nik’s marks: mean 81
5
x 405 , x 2 802 802 802 802 852 32825
4052 4052
32825 32825
s
2 5 5 s 2
5 5
4 4
69 78 80 80 98
Fizz’s marks : mean 81
5
x 405 , x 2 692 782 802 802 982 33249
22
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
4052
33249
s2 5 111
4
Standard deviation = 10.55
By comparing the variance and standard deviation, Fizz’s marks show a greater dispersion which means that
Fizz’s marks are less consistent. Thus, it can be concluded that Nik shows a better overall performance.
Example 4
165 135 151 155 158 146 149 124 162 173
a) Find the mean and the standard deviation of the systolic blood pressure of the 10 patients.
b) Find the number of patients whose systolic blood pressures exceed one standard deviation above or below the
mean.
Solution
165+135+151+155+158+146+149+124+162+173
a) The mean, x = 151.80
10
x 1518 , x2 232306
15182
232306
s2 10 208.1778
9
range of 137.37,166.23
23
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
fx 2
fx 2
n
Variance, s 2
Standard deviation, s s 2
n 1
Example 5
Find the mean, variance and standard deviation for the data below.
Solution
Mean, x
s2
s=
24
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 6
Wages per hour (RM) paid to temporary workers at production and marketing departments of a factory is as
shown in the following table.
Find estimates for median, mean and standard deviation for wages per hour for all the temporary workers in the
factory.
Solution
Numbers of
Wages x fx f x2
Temporary workers
N = 240
Median
Mean
Variance,
Standard deviation,
25
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 7
The frequency distribution table shows the masses of loaves of bread produced by a bakery.
Mass (g) 420 – 424 425 – 429 430 – 434 435 – 439 440 – 444
Frequency 16 24 25 18 17
Solution
Mass (g) /
Class Midpoint, x f fx fx2
boundary
419.5 – 424.5 422 16 6752 2849344
424.5 – 429.5 427 24 10248 4375896
429.5 – 434.5 432 25 10800 4665600
434.5 – 439.5 437 18 7866 3437442
439.5 – 444.5 442 17 7514 3321188
n = 100 fx 43180 fx 2
18649470
fx 2
431802
fx 2
N
18649470
100 43.8989
a) Variance, s 2
N 1 99
c) Mean, x
fx 43180 431.80 g
f 100
One standard deviation from the mean
431.80 6.63, 431.80 6.63 425.17, 438.43
Thus, the interval of mass of the loaves of bread allowed to be sold in the market is between 425.17g and
438.43g.
26
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 8
The frequency distribution table shows the hourly wages of workers in a factory.
Solution
Wage (RM) /
Midpoint, x f fx fx2
Class boundary
4.5 – 7.5 6 9 54 324
7.5 – 10.5 9 16 144 1296
10.5 – 13.5 12 11 132 1584
13.5 – 16.5 15 8 120 1800
16.5 – 19.5 18 6 108 1944
n = 50 fx 558 fx 2
6948
fx
2
5582
fx 2
N
6948
50 14.7086
a) Variance, s
2
N 1 49
b) After increase of 20% on the wages, that is the sum of each midpoint with 20% of each midpoint,
Wage (RM) /
Midpoint, x f fx fx2
Class boundary
4.5 – 7.5 7.2 9 64.8 466.56
7.5 – 10.5 10.8 16 172.8 1866.24
10.5 – 13.5 14.4 11 158.4 2280.96
13.5 – 16.5 18 8 144 2592
16.5 – 19.5 21.6 6 129.6 2799.36
n = 50 fx 669.6 fx 2
10005.12
27
SM025 | MATHEMATICS 2| 2023/2024
TOPIC 6: DATA DESCRIPTION
fx
2
669.62
fx 2
N
10005.12
50 21.1803
c) New Varians, s
2
N 1 49
NOTE : When the Pearson’s coefficient is very close to 0 (negative/positive), the distribution of data is almost symmetrical.
28
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 1
Given the following sorted data. Find the Pearson’s coefficient of skewness.
1.2, 1.5, 1.9, 2.4, 2.4, 2.5, 2.6, 3.0, 3.5, 3.8
Solution
Mean = 2.48
median = 2.45
mode = 2.4
s = 0.8176
Sk =
or Sk =
Example 2
The management of a large hospital recorded the age of a random sample of 160 patients. The
results of this survey are shown in the following table
29
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION
Solution.
th
f
a) Median = falls under 2 observation
th
160
= falls under observation
2
f
Fmed
med Lmed 2 c
f med
b) mean =
c) mode =
d) s=
or Sk
30
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION
Example 3
The following table gives the frequency distribution of 291 workers of a factory according to their
average monthly income in 1995 - 2005. Find the Pearson’s coefficient of skewness. Then give
your comment on the value obtain.
300 - 500 1
500-700 16
700-900 39
900-1100 58
1100-1300 60
1300-1500 46
1500-1700 22
1700-1900 15
1900-2100 15
2100-2300 9
2300 - 2500 10
Solution
Income group f c.f.
300 - 500 1 1
500 - 700 16 17
700 – 900 39 56
31
SM025 | MATHEMATICS 2 | 2023/2024
TOPIC 6: DATA DESCRIPTION
n = x = 291
th
f
Median = falls under 2 observation
th
291
= falls under observation
2
= observation of 146th item which lies in (1100-1300) class interval.
f
Fmed
med Lmed 2 c
f med
291
2 114
1100 200 = RM1136.20
174
2
Mode 1100 200 = RM1125
2 14
Mean = RM1280.07
3(1280.07 1136.20)
sk
450.07
= 0.959
or
1280.27 1125
sk = 0.345
450.07
Since the Pearson’s coefficient is positive, the distribution is slightly skewed to the right.
32