Chapter 3
Chapter 3
• For example, tables or graphs are used to organize data, and descriptive values such as the
average score are used to summarize data.
• A descriptive value for a population is called a parameter and a descriptive value for a
sample is called a statistic.
• Descriptive statistics can be used to describe data on a single variable. There have three
major techniques for describing the sample data which include graphical techniques,
measure of central tendency, measure of dispersion.
• Numerical descriptive measure and graphic techniques are used to present information
about the data being studied.
• Graphical techniques not able to convey a whole picture of frequency table. Therefore,
measure of central tendency and measure of dispersion are important to produce better data
interpretation.
• Descriptive statistics and graphical techniques can also be the final product of a statistical
analysis.
1
©NOOR MAIZATUL NAZUHA MOHAMAD
3.1 GRAPHICAL TECNIQUES
3.1.2 Organizing and Graphing Qualitative Data
When the raw data is obtained, we can tabulate the data in ordered manner in a frequency
table and contingency table, or using graphical presentation such as pie chart and bar chart.
Frequency Table
• A frequency table for qualitative data lists all categories and the number of elements
that belong to each of the categories.
• By using data below, let’s try construct the frequency table
A, B, D, B, C, C, C, A, A, B, D, D, B, D, C, B, D, A, D
From Table 1, we can conclude that the highest-class member is D, followed by B while A
and C share the same frequency.
Contingency Table
2
©NOOR MAIZATUL NAZUHA MOHAMAD
Table 2: Number of Staff for each Department in Company XYZ
Gender Department
Marketing Account Management
Female 4 6 1
Male 18 2 3
Total 22 8 4
From Table 2, there have 22, 8 and 4 staffs in marketing, account and management
departments respectively. Majority of the male staffs are working in marketing
department whereas majority of female staffs are working in account department.
Only 1 female staff working in management department.
Bar chart
• A graph made of bars that represent the frequencies of respective categories.
• State the title and labels for both axis appropriately.
• It has few types of bar charts that include:
✓ Vertical bar chart
✓ Horizontal bar chart
✓ Component bar chart
✓ Multiple bar chart
• Vertical bar chart
0
A B C D
3
©NOOR MAIZATUL NAZUHA MOHAMAD
• Horizontal bar chart
To construct horizontal bar chart, mark the various categories on the vertical axis
and mark the frequencies on the horizontal axis.
0 1 2 3 4 5 6 7
Management
Female Male
Account
Marketing
0 5 10 15 20 25
4
©NOOR MAIZATUL NAZUHA MOHAMAD
• Multiple bar chart
✓ To construct a multiple bar chart, each bar that representative any categories
are gathered in groups.
✓ The height of the bar represented the frequencies categories.
✓ Useful for making comparisons
20
18
16
14
12
10
8
6
4
2
0
Marketing Account Management
Female Male
Pie Char
5
©NOOR MAIZATUL NAZUHA MOHAMAD
Class Frequency Angle of Sector
A 4 4
× 360° = 76°
19
B 5 5
× 360° = 95°
19
C 4 4
× 360° = 76°
19
D 6 6
× 360° = 114°
19
Total 19
Table 3: Angle for each sector
A B C D
6
©NOOR MAIZATUL NAZUHA MOHAMAD
3.1.3 Organizing and Graphing Quantitative Data
For quantitative data, we will learn about time series graph, frequency distribution table,
histogram, frequency polygon, stem-and-leaf and ogive.
• Refer to graph that represents data that occur over a specific period time of time.
• This type of graph is popular because their visual characteristics reveal data trends
clearly and these graphs are easy to create.
• Two data sets can be compared on the same graph (called a compound time series
graph) if two lines are used.
• A line graph is a visual comparison of how two variables shown on the x-axis and
y-axis are related or vary with each other. It shows related information by drawing
a continuous line between all the points on a grid.
Example 1
A doctor wishes to use the following data for a presentation to show the trend of dengue
death from year 2007 until 2015. Draw a time series graph for the data and summarize
the findings.
7
©NOOR MAIZATUL NAZUHA MOHAMAD
Solution:
Dengue death
400
350
300
250
200
150
100
50
0
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
The graph shows a rise in number of dengue death through 2007 until 2010, and sloly
decreases for the years 2011 and 2012. However, start from 2013, it dramatically
increases until 2015.
In Table 5, we show the example of quantitative of data where the variable involved is
the examination scores.
20 16 15 26 24 20 15 19 35
16 30 43 14 7 21 24 10 44
23 13 40 11 20 6 37 9 38
24 44 14 17 23 27 32 20 45
18 8 30 23 37 19 10 24 17
8
©NOOR MAIZATUL NAZUHA MOHAMAD
When working with large quantitative data sets, it is often helpful to organize and
summarize data by constructing table called frequency distribution.
Scores No of Students, f
6 - 12 7
13 - 19 12
20 - 26 13
27 - 33 7
34 - 40 2
41 - 47 4
3. Decide the starting point of the class limit. Usually choose smallest number.
4. Identify the class limit, lower limit and upper limit.
5. Determine the frequency of each class by using counting or tally method.
• From Table 5, we decide the number of classes by using Sturge’s formula where c
is the no. of classes and n is the no. of observation in the data set.
𝑐 = 1 + 3.3 log 45 = 6.455 ≈ 6
So, the number of classes is 6.
• Then, from the class limits, we can identify the lower and upper limit from the class.
𝟏
𝑳𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 = 𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒊𝒕𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒇𝒊𝒓𝒔𝒕 𝒄𝒍𝒂𝒔𝒔 −
𝟐
𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚
(𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒊𝒕𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒄𝒍𝒂𝒔𝒔 + 𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒊𝒕𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒆𝒙𝒕 𝒄𝒍𝒂𝒔𝒔)
=
𝟐
9
©NOOR MAIZATUL NAZUHA MOHAMAD
Scores (Class limit) Class Boundary No of Students, f
6 - 12 5.5 – 12.5 7
13 - 19 12.5 – 19.5 12
20 - 26 19.5 – 26.5 13
27 - 33 26.5 – 33.5 7
34 - 40 33.5 – 40.5 2
41 - 47 40.5 – 47.5 4
• Hence, we can find the class width or size of the class by subtracting between lower
and upper limit.
𝑪𝒍𝒂𝒔𝒔 𝒘𝒊𝒅𝒕𝒉 = 𝒖𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 − 𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚
• The class midpoints are the value in the middle of the classes.
𝒍𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 + 𝒖𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕
𝑪𝒍𝒂𝒔𝒔 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 =
𝟐
10
©NOOR MAIZATUL NAZUHA MOHAMAD
Example 2
The researchers collect data from the employees by asking the approximate travelling
distance (miles) from their home to the office. The raw data is as follows;
1 2 6 7 12 13 2 6 9 5
18 7 3 15 15 4 17 1 14 5
4 16 4 5 8 6 5 20 5 2
9 11 12 1 9 2 10 11 4 10
9 19 8 8 4 14 7 3 2 6
11
©NOOR MAIZATUL NAZUHA MOHAMAD
Stem and Leaf Plot
• Stem and leaf plots are method for visualizing the frequency with which certain
classes of values occurs.
• In the stem and leaf plot, each value are separated into two parts which are stem
(on the left hand side) and leaf (on the right hand side).
• The plot can help us to understand the distributional of data either symmetry, skew
to the left or skew to the right.
✓ Symmetry – if a line is drawn down the middle of the graph, the two sides
will mirror each other.
✓ Skew to the left – Asymmetry (unbalanced). More data on the left-hand
side. Less data on the right-hand side.
✓ Skew to the right. Asymmetry (unbalanced). More data on the right-hand
side. Less data on the left-hand side.
• In constructing a stem and leaf plot, a key must be included to explain the meaning
of entries.
Example 3
The results of 22 students for a quiz of 30 multiple choice question are recorded as follows.
Display the data with a stem and leaf plot.
12 24 15 19 14 10 31 28 16 13 7
14 16 39 8 26 27 16 9 16 8 23
Solution:
Identify the stem and left (distinct each value in two parts). From the data, we can see that
the minimum number is 7 and the maximum number is 39. Meaning that, it is consisting
two digits. The first digit 0, 1, 2 and 3 can be used as a stem, while the leaf consists the
second digit. Draw a vertical line to separate stem and leaf. Stem at the left hand side, leaf
at the right hand side.
12
©NOOR MAIZATUL NAZUHA MOHAMAD
Stem Leaf
0 7 8 8 9
1 0 2 3 4 4 5 5 6 6 6 9
2 3 4 6 6 7 8
3 1 9
Histogram
• A histogram is a graph that displays the data continuous vertical bar of various
heights to represent the frequencies of the classes.
• The bars in histogram are drawn adjacent to each other without leaving any gap
between them.
• Steps for constructing a frequency distribution as follows;
1. Draw and label the x-axis and y-axis. The x-axis always the horizontal axis
and y-axis is always the vertical axis.
2. Represent the class boundaries on the x-axis and the frequency on the y-
axis.
3. Using the frequencies as the heights, draw vertical bars for each class.
13
©NOOR MAIZATUL NAZUHA MOHAMAD
Example 4
The following Table 7 shows the summary of certain data that have been collected in one
research.
Relative Frequency,
Class Limit Class boundary Midpoint, x Frequency %
1-5 0.5-5.5 3 3 15
6-10 5.5-10.5 8 7 35
11-15 10.5-15.5 13 4 20
16-20 15.5-20.5 18 3 15
21-25 20.5-25.5 23 1 5
26-30 25.5-30.5 28 2 10
20 0
8
7
6
Frequency
5
4
3
2
1
0
0.5-5.5 5.5-10.5 10.5-15.5 15.5-20.5 20.5-25.5 25.5-30.5
Class Boundary
14
©NOOR MAIZATUL NAZUHA MOHAMAD
Frequency Polygon
• The alternative way to present the frequency distribution is frequency polygon.
• This is a graph that displays the data by plotting frequencies against the class
midpoints or joining midpoints at the top of each histogram bar.
• Frequency polygons are normally used to compare distributions of two different set
of data.
5
Frequency
0
0 5 10 15 20 25 30
Midpoint
15
©NOOR MAIZATUL NAZUHA MOHAMAD
Cumulative Frequency Curve (Ogive)
• Another suitable way to presenting data is cumulative frequency.
• Cumulative frequency is the sum of the frequencies accumulated up to the upper
boundaries of the class.
• Before drawing a cumulative frequency curve, a cumulative frequency table which
comprises of the upper boundaries columns must be constructed first.
• The steps for constructing a cumulative frequency curve (ogive) are as follows;
1. Find the cumulative frequency for each class.
2. Draw the x and y axis. Label the x-axis with the class boundaries. Use the
appropriate scale for the y-axis to represent the cumulative frequencies.
3. Plot the cumulative frequency at each upper class boundary.
Cumulative
Class Limit Class boundary Midpoint, x Frequency Frequency
1-5 0.5-5.5 3 3 3
6-10 5.5-10.5 8 7 10
11-15 10.5-15.5 13 4 14
16-20 15.5-20.5 18 3 17
21-25 20.5-25.5 23 1 18
26-30 25.5-30.5 28 2 20
Table 7: The frequency distribution of the data
25
Cumulative Frequency
20
15
10
0
0 5 10 15 20 25 30 35
Upper Boundary
16
©NOOR MAIZATUL NAZUHA MOHAMAD
3.2 MEAN, MODE, MEDIAN AND STANDARD DEVIATION
• Measure of central tendency is a single value that attempts to describe a set of data
by identifying the central position within that set of data. As such, measures of
central tendency are sometimes called measure of central location.
• The mean is most likely the measure of central tendency that you are most familiar
with, but there are others such as median and mode.
• The mean, median and mode are all valid measures of central tendency but under
different condition, some measures of central tendency become more appropriate
to use than other.
• What is mean, median and mode in statistics?
✓ Mean is the average of a data set
✓ Mode is the most common number in a data set.
✓ Median is the middle of the set of numbers.
• The mean requires you to compute. Adding all the numbers and dividing with the
sample size.
• Mode is the most popular member of the data set.
• Arrange the data either in ascending or descending order, then the middle value is
your median.
• Use the mean to describe the sample with the single value that represents the center
of the data. Many statistical analyses use the mean as a standard measure of the
center of the distribution of the data.
• When you have unusual values, you can compare the mean and the median to
decide which is the better measure to use. If your data symmetric, the mean and
median are the same.
• The standard deviation is the most common measure of dispersion.
• Therefore, standard deviation is the average distance from one to another points of
data.
17
©NOOR MAIZATUL NAZUHA MOHAMAD
Table 8: Best Central Tendency Measure
18
©NOOR MAIZATUL NAZUHA MOHAMAD
Step 2: Click on the Arrow button into the variable box.
19
©NOOR MAIZATUL NAZUHA MOHAMAD
Step 4: Click on Continue then Click OK.
20
©NOOR MAIZATUL NAZUHA MOHAMAD
Option 2
Step 1: Select Graphs menu, Select Legacy Dialog, then Choose appropriate charts.
Descriptive
Step 1: Select Analyze menu, Select Descriptive Statistics, Click on Descriptive, Select
21
©NOOR MAIZATUL NAZUHA MOHAMAD
the appropriate variable.
Step 2: Click on the Arrow button into the variable box.
Step 3: Click on Options, Select the appropriate statistics
Step 4: Click on Continue, then Click on OK.
Figure 10
From Figure 10, the mean score for English and Mathematics subjects are 41.28 and 40.90
respectively. The median values obtained are 51 and 40 respectively. The standard
deviation for both subjects are 24.074 and 17.522 respectively. The minimum scores for
English subject are 2 and maximum are 75. The minimum and maximum value for
Mathematics subject is 16 and 86. The skewness coefficient for English subjects are -0.547
and this indicates that the distribution of student’s marks for English subject are skew to
the left. While the skewness coefficient for Mathematics subject is 0.912 that indicates the
distribution of student’s marks for Mathematics subject are skew to the right.
22
©NOOR MAIZATUL NAZUHA MOHAMAD
EXERCISE
Table below shows the marks obtained by three different classes in a statistics test.
Feed the data into SPSS editor and find the following descriptive values for each of the
classes.
a) Mean
b) Median
c) Mode
d) Standard deviation
e) Variance
f) Range
g) Skewness
23
©NOOR MAIZATUL NAZUHA MOHAMAD