100% found this document useful (1 vote)

158 views55 pages

Chapter2 091117004812 Phpapp01

This document provides an overview of descriptive statistics and organizing qualitative and quantitative data. It discusses frequency distributions, measures of central tendency, dispersion, and skewness. Graphical methods like bar graphs, pie charts, and line graphs are presented as ways to organize and visualize different types of data. The objectives are to help students create and interpret various graphs, calculate statistical measures, and understand how to summarize and describe data distributions.

Uploaded by

Mohamad Zamri Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

158 views55 pages

Chapter2 091117004812 Phpapp01

Uploaded by

Mohamad Zamri Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 55

QQS1013

ELEMENTARY STATISTIC

CHAPTER 2
DESCRIPTIVE STATISTICS

2.1 Introduction
2.2 Organizing and Graphing Qualitative Data
2.3 Organizing and Graphing Quantitative Data
2.4 Central Tendency Measurement
2.5 Dispersion Measurement
2.6 Mean, Variance and Standard Deviation for
Grouped Data
2.7 Measure of Skewness

2
OBJECTIVES

After completing this chapter, students should be able to:

- Create and interpret graphical displays involve qualitative
and quantitative data.
- Describe the difference between grouped and ungrouped
frequency distribution, frequency and relative frequency,
relative frequency and cumulative relative frequency.
- Identify and describe the parts of a frequency distribution:
class boundaries, class width, and class midpoint.
- Identify the shapes of distributions.
- Compute, describe, compare and interpret the three
measures of central tendency: mean, median, and mode for
ungrouped and grouped data.
- Compute, describe, compare and interpret the two measures
of dispersion: range, and standard deviation (variance) for
ungrouped and grouped data.
- Compute, describe, and interpret the two measures of
position: quartiles and interquartile range for ungrouped and
grouped data.
- Compute, describe and interpret the measures of skewness:
Pearson Coefficient of Skewness.

3
2.1 Introduction

Raw data - Data recorded in the sequence in which there are
collected and before they are processed or ranked.

Array data - Raw data that is arranged in ascending or descending
order.

Example 1

Here is a list of question asked in a large statistics class and the raw
data given by one of the students:

1. What is your sex (m=male, f=female)?
Answer (raw data): m

2. How many hours did you sleep last night?
Answer: 5 hours

3. Randomly pick a letter S or Q.
Answer: S

4. What is your height in inches?
Answer: 67 inches

5. Whats the fastest youve ever driven a car (mph)?
Answer: 110 mph

Example 2

Quantitative raw data

4
Qualitative raw data

- These data also called ungrouped data

2.2 Organizing and Graphing Qualitative Data

2.2.1 Frequency Distributions/ Table
2.2.2 Relative Frequency and Percentage Distribution
2.2.3 Graphical Presentation of Qualitative Data

2.2.1 Frequency Distributions / Table

- A frequency distribution for qualitative data lists all categories and
the number of elements that belong to each of the categories.
- It exhibits the frequencies are distributed over various categories
- Also called as a frequency distribution table or simply a frequency
table.
- The number of students who belong to a certain category is called
the frequency of that category.

5

2.2.2 Relative Frequency and Percentage Distribution

- A relative frequency distribution is a listing of all categories along
with their relative frequencies (given as proportions or percentages).
- It is commonplace to give the frequency and relative frequency
distribution together.
- Calculating relative frequency and percentage of a category

Relative Frequency of a category
= Frequency of that category
Sum of all frequencies

Percentage = (Relative Frequency)* 100

6
Example 3

A sample of UUM staff-owned vehicles produced by Proton was
identified and the make of each noted. The resulting sample follows (W =
Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):

W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv

Construct a frequency distribution table for these data with their relative
frequency and percentage.

Solution:

Category Frequency
Relative
Frequency
Percentage (%)
Wira 19 19/50 = 0.38
0.38*100
= 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100

2.2.3 Graphical Presentation of Qualitative Data

1. Bar Graphs

- A graph made of bars whose heights represent the frequencies of
respective categories.
- Such a graph is most helpful when you have many categories to
represent.
- Notice that a gap is inserted between each of the bars.
- It has
=> simple/ vertical bar chart
7
=> horizontal bar chart
=> component bar chart
=> multiple bar chart

Simple/ Vertical Bar Chart

- To construct a vertical bar chart, mark the various categories on the
horizontal axis and mark the frequencies on the vertical axis
- Refer to Figure 2.1 and Figure 2.2,

Figure 2.1 Figure 2.2

Horizontal Bar Chart

- To construct a horizontal bar chart, mark the various categories on
the vertical axis and mark the frequencies on the horizontal axis.
Example 4: Refer Example 3,

Figure 2.3
0 5 10 15 20
Wira
Iswara
Perdana
Waja
Satria
Savvy
Frequency
T
y
p
e
s

o
f

V
e
h
i
c
l
e

UUM Staff-owned Vehicles Produced By
Proton
8
- Another example of horizontal bar chart: Figure 2.4

Figure 2.4: Number of students at Diversity College who are
immigrants, by last country of permanent residence

Component Bar Chart

- To construct a component bar chart, all categories is in one bar and
every bar is divided into components.
- The height of components should be tally with representative
frequencies.

Example 5

Suppose we want to illustrate the information below, representing
the number of people participating in the activities offered by an
outdoor pursuits centre during Jun of three consecutive years.

2004 2005 2006
Climbing 21 34 36
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Total 142 167 191

9
Solution:

Figure 2.5

Mulztiple Bar Chart

- To construct a multiple bar chart, each bars that representative any
categories are gathered in groups.
- The height of the bar represented the frequencies of categories.
- Useful for making comparisons (two or more values).
- Example 6: Refer example 5,
Figure 2.6

0
20
40
60
80
100
120
140
160
180
200
2004 2005 2006
N
u
m
b
e
r

o
f

p
a
r
t
i
c
i
p
a
n
t
s

Year
Activities Breakdown (Jun)
Sailing
Walking
Caving
Climbing
0
20
40
60
80
100
120
2004 2005 2006
N
u
m
b
e
r

o
f

p
a
r
t
i
c
i
p
a
n
t
s

Year
Activities Breakdown (Jun)
Climbing
Caving
Walking
Sailing
10
- Another example of horizontal bar chart: Figure 2.7

Figure 2.7: Preferred snack choices of students at UUM

- The bar graphs for relative frequency and percentage distributions
can be drawn simply by marking the relative frequencies or
percentages, instead of the class frequencies.

2. Pie Chart

- A circle divided into portions that represent the relative frequencies
or percentages of a population or a sample belonging to different
categories.
- An alternative to the bar chart and useful for summarizing a single
categorical variable if there are not too many categories.
- The chart makes it easy to compare relative sizes of each
class/category.
- The whole pie represents the total sample or population. The pie is
divided into different portions that represent the different categories.
- To construct a pie chart, we multiply 360
o
by the relative frequency
for each category to obtain the degree measure or size of the angle
for the corresponding categories.

11
Example 7 (Table 2.6 and Figure 2.8):

Table 2.6 Figure 2.8

Example 8 (Table 2.7 and Figure 2.9):

Movie
Genres
Frequency Relative
Frequency
Angle Size
Comedy
Action
Romance
Drama
Horror
Foreign
Science
Fiction
54
36
28
28
22
16
16
0.27
0.18
0.14
0.14
0.11
0.08
0.08
360*0.27=97.2
o

360*0.18=64.8
o

360*0.14=50.4
o

360*0.14=50.4
o

360*0.11=39.6
o

360*0.08=28.8
o

360*0.08=28.8
o

200 1.00 360o

Figure 2.9
Figure 2.9
12
3. Line Graph/Time Series Graph

- A graph represents data that occur over a specific period time of
time.
- Line graphs are more popular than all other graphs combined
because their visual characteristics reveal data trends clearly and
these graphs are easy to create.
- When analyzing the graph, look for a trend or pattern that occurs
over the time period.
- Example is the line ascending (indicating an increase over time) or
descending (indicating a decrease over time).
- Another thing to look for is the slope, or steepness, of the line. A line
that is steep over a specific time period indicates a rapid increase or
decrease over that period.
- Two data sets can be compared on the same graph (called a
compound time series graph) if two lines are used.
- Data collected on the same element for the same variable at different
points in time or for different periods of time are called time series
data.
- A line graph is a visual comparison of how two variablesshown on
the x- and y-axesare related or vary with each other. It shows
related information by drawing a continuous line between all the
points on a grid.
- Line graphs compare two variables: one is plotted along the x-axis
(horizontal) and the other along the y-axis (vertical).
- The y-axis in a line graph usually indicates quantity (e.g., RM,
numbers of sales litres) or percentage, while the horizontal x-axis
often measures units of time. As a result, the line graph is often
viewed as a time series graph
13
Example 9

A transit manager wishes to use the following data for a presentation
showing how Port Authority Transit ridership has changed over the
years. Draw a time series graph for the data and summarize the
findings.

Year
Ridership
(in millions)
1990
1991
1992
1993
1994
88.0
85.0
75.7
76.6
75.4

Solution:

The graph shows a decline in ridership through 1992 and then leveling off
for the years 1993 and 1994.

75
77
79
81
83
85
87
89
1990 1991 1992 1993 1994
R
i
d
e
r
s
h
i
p

(
i
n

m
i
l
l
i
o
n
s
)

Year
14
Exercise 1

1. The following data show the method of payment by 16 customers in a
supermarket checkout line. Here, C = cash, CK = check, CC = credit card, D =
debit and O = other.

C CK CK C CC D O C
CK CC D CC C CK CK CC

a. Construct a frequency distribution table.
b. Calculate the relative frequencies and percentages for all categories.
c. Draw a pie chart for the percentage distribution.

2. The frequency distribution table represents the sale of certain product in ZeeZee
Company. Each of the products was given the frequency of the sales in certain
period. Find the relative frequency and the percentage of each product. Then,
construct a pie chart using the obtained information.

Type of
Product
Frequency Relative
Frequency
Percentage Angle Size
A
B
C
D
E
13
12
5
9
11

3. Draw a time series graph to represent the data for the number of worldwide airline
fatalities for the given years.

Year 1990 1991 1992 1993 1994 1995 1996
No. of
fatalities
440 510 990 801 732 557 1132

4. A questionnaire about how people get news resulted in the following information
from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).

N N R T T
R N T M R
M M N R N
T R M N M
T R R N N

a. Construct a frequency distribution for the data.
b. Construct a bar graph for the data.

15
5. The given information shows the export and import trade in million RM for four
months of sales in certain year. Using the provided information, present this
data in component bar graph.

Month Export Import
September
October
November
December
28
30
32
24
20
28
17
14

6. The following information represents the maximum rain fall in millimeter (mm)
in each state in Malaysia. You are supposed to help a meteorologist in your
place to make an analysis. Based on your knowledge, present this information
using the most appropriate chart and give your comment.

State Quantity (mm)
Perlis
Kedah
Pulau Pinang
Perak
Selangor
Wilayah Persekutuan
Kuala Lumpur
Negeri Sembilan
Melaka
Johor
Pahang
Terengganu
Kelantan
Sarawak
Sabah
435
512
163
721
664

1003
390
223
876
1050
1255
986
878
456

16
2.3 Organizing and Graphing Quantitative Data

2.3.1 Stem and Leaf Display
2.3.2 Frequency Distribution
2.3.3 Relative Frequency and Percentage
Distributions.
2.3.4 Graphing Grouped Data
2.3.5 Shapes of Histogram
2.3.6 Cumulative Frequency Distributions.

2.3.1 Stem-and-Leaf Display

In stem and leaf display of quantitative data, each value is
divided into two portions a stem and a leaf. Then the leaves
for each stem are shown separately in a display.
Gives the information of data pattern.
Can detect which value frequently repeated.

Example 10

25 12 9 10 5 12 23 7
36 13 11 12 31 28 37 6
14 41 38 44 13 22 18 19

Solution:

0 9 5 7 6
1 2 0 2 3 1 2 4 3 8 9
2 5 3 8 2
3 6 1 7 8
4 1 4

17
2.3.2 Frequency Distributions

A frequency distribution for quantitative data lists all the classes and
the number of values that belong to each class.
Data presented in form of frequency distribution are called grouped
data.

The class boundary is given by the midpoint of the upper limit of
one class and the lower limit of the next class. Also called real class
limit.
To find the midpoint of the upper limit of the first class and the
lower limit of the second class, we divide the sum of these two limits
by 2.

e.g.:

+
=
400 401
400.5
2

class boundary
18

Class Width (class size)

Class width = Upper boundary Lower boundary

e.g. :
Width of the first class = 600.5 400.5 = 200

Class Midpoint or Mark

Lower limit + Upper limit
class midpoint or mark =
2

e.g:
+
=
401 600
Midpoint of the 1st class = 500.5
2

19
Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturges formula,
which is

c = 1 + 3.3 log n

where c is the no. of classes
n is the no. of observations in the data set.

2. Class width,

>
>
Largest value - Smallest value
Number of classes
Range
i
i
c

This class width is rounded to a convenient number.

3. Lower Limit of the First Class or the Starting Point

Use the smallest value in the data set.

Example 11

The following data give the total home runs hit by all players of each of
the 30 Major League Baseball teams during 2004 season

20

Solution:

i) Number of classes, c = 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 ~ 6 class

ii) Class width,

>
>
~
242 135
6
17.8
18
i

iii) Starting Point = 135

Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f
135 152
153 170
171 188
189 206
207 224
225 242
|||| ||||
||
||||
|||| |
|||
||||
10
2
5
6
3
4

=

30 f
21
2.3.3 Relative Frequency and Percentage Distributions

-

Frequency of that class

Relative frequency of a class =
Sum of all frequencies
=
Percentage = (Relative frequency) 100
f
f

Example 12 (Refer example 11)

Table 2.11: Relative Frequency and Percentage Distributions

Total Home
Runs
Class Boundaries Relative
Frequency
%
135 152
153 170
171 188
189 206
207 224
225 242
134.5 less than 152.5
152.5 less than 170.5
170.5 less than 188.5
188.5 less than 206.5
206.5 less than 224.5
224.5 less than 242.5
0.3333
0.0667
0.1667
0.2
0.1
0.1333
33.33
6.67
16.67
20
10
13.33
Sum 1.0 100%

2.3.4 Graphing Grouped Data

1. Histograms

- A histogram is a graph in which the class boundaries are
marked on the horizontal axis and either the frequencies,
relative frequencies, or percentages are marked on the vertical
axis. The frequencies, relative frequencies or percentages are
represented by the heights of the bars.
- In histogram, the bars are drawn adjacent to each other and
there is a space between y axis and the first bar.

22
0
2
4
6
8
10
12
1
F
r
e
q
u
e
n
c
y

Total home runs
Example 13 (Refer example 11)

Figure 2.10: Frequency histogram for Table 2.10

2. Polygon

A graph formed by joining the midpoints of the tops of
successive bars in a histogram with straight lines is called a
polygon.

Example 13

Figure 2.11: Frequency polygon for Table 2.10
0
2
4
6
8
10
12
1
F
r
e
q
u
e
n
c
y

Total home runs
134.5 152.5 170.5 188.5 206.5 224.5 242.5
134.5 152.5 170.5 188.5 206.5 224.5 242.5
23
For a very large data set, as the number of classes is increased (and
the width of classes is decreased), the frequency polygon eventually
becomes a smooth curve called a frequency distribution curve or
simply a frequency curve.

Figure 2.12: Frequency distribution curve

2.3.5 Shape of Histogram

- Same as polygon.
- For a very large data set, as the number of classes is increased
(and the width of classes is decreased), the frequency polygon
eventually becomes a smooth curve called a frequency
distribution curve or simply a frequency curve.
- The most common of shapes are:
(i) Symmetric

Figure 2.13 & 2.14: Symmetric histograms

24

(ii) Right skewed and (iii) Left skewed

Figure 2.15 & 2.16: Right skewed and Left skewed

- Describing data using graphs helps us insight into the main
characteristics of the data.
- When interpreting a graph, we should be very cautious. We should
observe carefully whether the frequency axis has been truncated or
whether any axis has been unnecessarily shortened or stretched.

2.3.6 Cumulative Frequency Distributions

- A cumulative frequency distribution gives the total number of
values that fall below the upper boundary of each class.

Example 14: Using the frequency distribution of table 2.11,

Total Home
Runs
Class Boundaries Cumulative Frequency
135 152
153 170
171 188
189 206
207 224
225 242
134.5 less than 152.5
152.5 less than 170.5
170.5 less than 188.5
188.5 less than 206.5
206.5 less than 224.5
224.5 less than 242.5
10
10+2=12
10+2+5=17
10+2+5+6=23
10+2+5+6+3=26
10+2+5+6+3+4=30
25
Ogive

- An ogive is a curve drawn for the cumulative frequency distribution
by joining with straight lines the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies
of respective classes.
- Two type of ogive:
(i) ogive less than
(ii) ogive greater than
- First, build a table of cumulative frequency.

Example 15 (Ogive Less Than)
Earnings (RM)

Cumulative
Frequency
(F)

Less than 29.5
Less than 39.5
Less than 49.5
Less than 59.5
Less than 69.5
Less than 79.5
Less than 89.5

0
5
11
17
20
23
30

Figure 2.17

5
6
6
3
3
7
30 39
40 49
50 59
60 - 69
70 79
80 - 89
30
Number of
students (f)
Total
Earnings
(RM)
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y

0
5
10
15
20
25
30
35
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings
26
Example 16 (Ogive Greater Than)

Figure 2.18

Figure 2.18
5
6
6
3
3
7
30 39
40 49
50 59
60 - 69
70 79
80 - 89
30
Number of
students (f)
Total
Earnings
(RM)
30
25
19
13
10
7
0
More than 29.5
More than 39.5
More than 49.5
More than 59.5
More than 69.5
More than 79.5
More than 89.5
Cumulative
Frequency (F)
Earnings
(RM)
0
5
10
15
20
25
30
35
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y

F
r
e
q
u
e
n
c
y

27
2.3.7 Box-Plot

Describe the analyze data graphically using 5 measurement:
smallest value, first quartile (K1), second quartile (median or
K2), third quartile (K3) and largest value.

2.4 Measures of Central Tendency

2.4.1 Ungrouped Data
(1) Mean
(2) Weighted mean
(3) Median
(4) Mode
2.4.2 Grouped Data
(1) Mean
(2) Median
(3) Mode

Smallest
value
Largest
value
K1 Median K3
Largest
value
K1 Median K3
Largest
value
K1 Median K3
Smallest
value
Smallest
value
For symmetry data

For left skewed data

For right skewed data

28
2.4.3 Relationship among mean, median & mode

2.4.1 Ungrouped Data

1. Mean
- Mean for population data:
x
N
=

- Mean for sample data:
x
x
n
=

where: x

= the sum af all values

N = the population size
n = the sample size,
= the population mean
x = the sample mean

Example 17

The following data give the prices (rounded to thousand RM) of five
homes sold recently in Sekayang.

158 189 265 127 191

Find the mean sale price for these homes.

Solution:

158 189 265 127 191
5
930
5
186
x
x
n
=
+ + + +
=
=
=

Thus, these five homes were sold for an average price of RM186
thousand @ RM186 000.
29
- The mean has the advantage that its calculation includes each value
of the data set.

2. Weighted Mean

- Used when have different needs.
- Weight mean :

w
wx
x
w
=

where w is a weight.

Example 18

Consider the data of electricity components purchasing from a factory in
the table below:

Type Number of component (w) Cost/unit (x)
1
2
3
4
5
1200
500
2500
1000
800
RM3.00
RM3.40
RM2.80
RM2.90
RM3.25
Total 6000

Solution:

1200(3) 500(3.4) 2500(2.8) 1000(2.9) 800(3.25)
1200 500 2500 1000 800
17800
6000
2.967
w
wx
x
w
=
+ + + +
+ + + +

=
=
=

Mean cost of a unit of the component is RM2.97

30
3. Median

- Median is the value of the middle term in a data set that has been
ranked in increasing order.
- Procedure for finding the Median
Step 1: Rank the data set in increasing order.

Step 2: Determine the depth (position or location) of the median.

1
2
n +
Depth of Median =

Step 3: Determine the value of the Median.

Example 19

Find the median for the following data:
10 5 19 8 3

Solution:
(1) Rank the data in increasing order
3 5 8 10 19

(2) Determine the depth of the Median

1
2
5 1
2
3
n +
+
Depth of Median =
=
=

(3) Determine the value of the median

Therefore the median is located in third position of the data set.

3 5 8 10 19

Hence, the Median for above data = 8

31
Example 20

Find the median for the following data:
10 5 19 8 3 15

Solution:

(1) Rank the data in increasing order
3 5 8 10 15 19

(2) Determine the depth of the Median

1
2
6 1
2
3.5
n +
+
Depth of Median =
=
=

(3) Determine the value of the Median

Therefore the median is located in the middle of 3
rd
position and 4
th

position of the data set.

8 10
9
2
+
= = Median

Hence, the Median for the above data = 9

- The median gives the center of a histogram, with half of the data
values to the left of (or, less than) the median and half to the right of
(or, more than) the median.
- The advantage of using the median is that it is not influenced by
outliers.

32
4. Mode

- Mode is the value that occurs with the highest frequency in a
data set.

Example 21

1. What is the mode for given data?
77 69 74 81 71 68 74 73

2. What is the mode for given data?
77 69 68 74 81 71 68 74 73

Solution:

1. Mode = 74 (this number occurs twice): Unimodal

2. Mode = 68 and 74: Bimodal

- A major shortcoming of the mode is that a data set may have
none or may have more than one mode.
- One advantage of the mode is that it can be calculated for both
kinds of data, quantitative and qualitative.

2.4.2 Grouped Data

1. Mean

- Mean for population data:

fx
=
N

- Mean for sample data:

fx
x =
n

Where x the midpoint and f is the frequency of a class.
33
Example 22

The following table gives the frequency distribution of the number of
orders received each day during the past 50 days at the office of a mail-
order company. Calculate the mean.

Solution:

Because the data set includes only 50 days, it represents a sample. The
value of fx

is calculated in the following table:

Number
of order
f x fx
10 12
13 15
16 18
19 21
4
12
20
14
11
14
17
20
44
168
340
280
n = 50
fx

= 832

The value of mean sample is:

fx
832
x = = =16.64
n 50

Thus, this mail-order company received an average of 16.64 orders per
day during these 50 days.
Number
of order
f
10 12
13 15
16 18
19 21
4
12
20
14
n = 50
34

2. Median

- Step 1: Construct the cumulative frequency distribution.
- Step 2: Decide the class that contain the median.

Class Median is the first class with the value of cumulative
frequency is at least n/2.
- Step 3: Find the median by using the following formula:

Where:
n = the total frequency
F = the total frequency before class median
i = the class width
= the lower boundary of the class median
= the frequency of the class median

Example 23

Based on the grouped data below, find the median:

Time to travel to work Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7

Median
| |
|
|
|
\ .
m
m
n
- F
2
= L + i
f
m
L
m
f
35
Solution:

1
st
Step: Construct the cumulative frequency distribution

Time to travel
to work
Frequency Cumulative
Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7
8
22
34
43
50

Class median is the 3
rd
class

So, F = 22, = 12, = 21.5 and i = 10

Therefore,

Thus, 25 persons take less than 24 minutes to travel to work and another
25 persons take more than 24 minutes to travel to work.

25
2
50
2
= =
n
m
f
m
L
2
25 22
21 5 10
12
24
| |
|
+
|
|
\ .
| |
+
|
\ .
Median
=
=
m
m
n
- F
= L i
f
-
.
36
3. Mode

- Mode is the value that has the highest frequency in a data set.
- For grouped data, class mode (or, modal class) is the class with
the highest frequency.
- To find mode for grouped data, use the following formula:

Where:

is the lower boundary of class mode

is the difference between the frequency of class mode and
the frequency of the class before the class mode

is the difference between the frequency of class mode and
the frequency of the class after the class mode

i is the class width

Example 24

Based on the grouped data below, find the mode

Time to travel to work Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7

| |
|
\ .
Mode
1
mo
1 2

= L + i
+
mo
L
1
A
2
A
37
Solution:

Based on the table,

= 10.5, = (14 8) = 6, = (14 12) = 2 and i = 10

We can also obtain the mode by using the histogram;

Figure 2.19

mo
L
1
A
2
A
6
10 5 10 17 5
6 2
| |
+ =
|
+
\ .
Mode= . .
38
2.4.3 Relationship among mean, median & mode

- As discussed in previous topic, histogram or a frequency
distribution curve can assume either skewed shape or
symmetrical shape.
- Knowing the value of mean, median and mode can give us
some idea about the shape of frequency curve.

(1) For a symmetrical histogram and frequency curve with one
peak, the value of the mean, median and mode are identical
and they lie at the center of the distribution.(Figure 2.20)
(2) For a histogram and a frequency curve skewed to the right, the
value of the mean is the largest that of the mode is the smallest
and the value of the median lies between these two.
Figure 2.20: Mean, median, and
mode for a symmetric histogram
and frequency distribution curve
Figure 2.21: Mean, median, and mode for
a histogram and frequency distribution
curve skewed to
the right

(3) For a histogram and a
frequency curve skewed to
the left, the value of the
mean is the smallest and
that of the mode is the
largest and the value of the
median lies between these
two.
39

Figure 2.22: Mean, median, and mode for a histogram and
frequency distribution curve skewed to the left
40
2.5 Dispersion Measurement

- The measures of central tendency such as mean, median and
mode do not reveal the whole picture of the distribution of a
data set.
- Two data sets with the same mean may have a completely
different spreads.
- The variation among the values of observations for one data
set may be much larger or smaller than for the other data set.

2.5.1 Ungrouped data

(1) Range
(2) Standard Deviation

2.5.2 Grouped data

(1) Range
(2) Standard deviation

2.5.3 Relative Dispersion Measurement

2.5.1 Ungrouped Data

1. Range

RANGE = Largest value Smallest value

Example 25:

Find the range of production for this data set,
41
Solution:

Range = Largest value Smallest value
= 267 277 49 651
= 217 626

- Disadvantages:
o being influenced by outliers.
o Based on two values only. All other values in a data set are
ignored.

2. Variance and Standard Deviation

- Standard deviation is the most used measure of dispersion.
- A Standard Deviation value tells how closely the values of a data
set clustered around the mean.
- Lower value of standard deviation indicates that the data set value
are spread over relatively smaller range around the mean.
- Larger value of data set indicates that the data set value are spread
over relatively larger around the mean (far from mean).
- Standard deviation is obtained the positive root of the variance:

Variance Standard Deviation
Population
( )
N
N
x
x

=
2
2
2
o

2 2
o o =

Sample
( )
1
2
2
2

n
n
x
x
s

2 2
s s =

42
Example 26

Let x denote the total production (in unit) of company

Company Production
A
B
C
D
E
62
93
126
75
34

Find the variance and standard deviation,

Solution:

Company Production (x) x
2

A
B
C
D
E
62
93
126
75
34
3844
8649
15 876
5625
1156

1156

35150
2

= x

( )
( )
2
5
5 1
1182 50

390
35150-
=
=
2
2
2
x
x -
n
s =
n -1
.

Since s
2
= 1182.50;

Therefore,
1182 50
34 3875
=
=
s .
.

43

- The properties of variance and standard deviation:

(1) The standard deviation is a measure of variation of all values
from the mean.

(2) The value of the variance and the standard deviation are never
negative. Also, larger values of variance or standard deviation
indicate greater amounts of variation.

(3) The value of s can increase dramatically with the inclusion of
one or more outliers.

(4) The measurement units of variance are always the square of
the measurement units of the original data while the units of
standard deviation are the same as the units of the original
data values.

2.5.2 Grouped Data

1. Range

Class Frequency
41 50
51 60
61 70
71 80
81 90
91 - 100
1
3
7
13
10
6
Total 40

Upper bound of last class = 100.5
Lower bound of first class = 40.5
Range = 100.5 40.5 = 60
Range = Upper bound of last class Lower bound of first class

44
2. Variance and Standard Deviation

Variance Standard Deviation
Population
( )
2
2
2

o =

fx
fx
N
N

2 2
o o =

Sample
( )
2
2
2
1

fx
fx
n
s
n

2 2
s s =

Example 27

Find the variance and standard deviation for the following data:

Solution:

No. of order f x fx fx
2

10 12
13 15
16 18
19 21
4
12
20
14
11
14
17
20
44
168
340
280
484
2352
5780
5600
Total n = 50 857 14216

No. of order f
10 12
13 15
16 18
19 21
4
12
20
14
Total n = 50
45
Variance, Standard Deviation,

( )
( )
2
2
2
2
1
832
14216
50
50 1
7 5820

fx
fx
n
s
n
.

Thus, the standard deviation of the number of orders received at the
office of this mail-order company during the past 50 days is 2.75.

2.5.3 Relative Dispersion Measurement

- To compare two or more distribution that has different unit
based on their dispersion Or
- To compare two or more distribution that has same unit but big
different in their value of mean.
- Also called modified coefficient or coefficient of variation,
CV.

) ( % 100
) ( % 100
population
x
CV
sample
x
s
CV

|
.
|

\
|
=

|
.
|

\
|
=
o
75 . 2 5820 . 7
2
= = = s s
46
Example 28

Given mean and standard deviation of monthly salary for two groups of
worker who are working in ABC company- Group 1: 700 & 20 and
Group 2 :1070 & 20. Find the CV for every group and determine which
group is more dispersed.

Solution:

1
2
20
100 286
700
20
100 187
1070
= =
= =
CV % . %
CV % . %

The monthly salary for group 1 worker is more dispersed compared to
group 2.

2.6 Measure of Position

- Determines the position of a single value in relation to other
values in a sample or a population data set.

2.6.1 Ungrouped Data

1. Quartiles

2. Interquatile Range

2.6.2 Grouped Data

1. Quartile

2. Interquartile Range

47
1. Quartiles

- Quartiles are three summary measures that divide ranked data
set into four equal parts.

-
The 1
st
quartiles denoted as Q
1

1
4
+
1
Depth of Q =
n

-
The 2
nd
quartiles median of a data set or Q
2

-
The 3
rd
quartiles denoted as Q
3

3 1
4
+
3
Depth of Q =
( n )

Example 29

1. Table below lists the total revenue for the 11 top tourism company in
Malaysia

109.7 79.9 21.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5
86.8

Solution:

Step 1: Arrange the data in increasing order

76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2

Step 2: Determine the depth for Q
1
and Q
3

48

1 11 1
3
4 4
+ +
1
Depth of Q = = =
n

( )
3 11 1
3 1
9
4 4
+
+
3
Depth of Q = = =
( n )

Step 3: Determine the Q
1
and Q
3

76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2

Q
1
= 79.9

Q
3
= 103.5

2. Table below lists the total revenue for the 12 top tourism company in
Malaysia

109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3
98.0 103.5 86.8

Solution:

Step 1: Arrange the data in increasing order

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5
109.7 121.2

Step 2: Determine the depth for Q
1
and Q
3

1 12 1
3 25
4 4
+ +
1
Depth of Q = = =
n
.

( )
3 12 1
3 1
9 75
4 4
+
+
3
Depth of Q = = =
( n )
.

49
Step 3: Determine the Q
1
and Q
3

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5
109.7 121.2

Q
1
= 79.4 + 0.25 (79.9 79.4) = 79.525

Q
3
= 98.0 + 0.75 (103.5 98.0) = 102.125

2. Interquartile Range

- The difference between the third quartile and the first quartile
for a data set.

IQR = Q
3
Q
1

Example 30

By referring to example 29, calculate the IQR.

Solution:

IQR = Q
3
Q
1
= 102.125 79.525 = 22.6

2.6.2 Grouped Data

1. Quartiles

- From Median, we can get Q
1
and Q
3
equation as follows:

1
1
1 Q
Q
n
- F
4
Q L + i
f
| |
|
=
|
|
\ .
;
| |
|
=
|
|
\ .
3
3
3 Q
Q
3n
- F
4
Q L + i
f

50
Example 31

Refer to example 23, find Q
1
and Q
3

Solution:

1
st
Step: Construct the cumulative frequency distribution

Time to travel
to work
Frequency Cumulative Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7
8
22
34
43
50

2
nd
Step: Determine the Q
1
and Q
3

1
n 50
Class Q 12 5
4 4
. = = =

Class Q
1
is the 2
nd
class

Therefore,

1
1
1
4
12 5 8
10 5 10
14
13 7143
| |
|
= +
|
|
\ .
| |
= +
|
\ .
=
Q
Q
n
- F
Q L i
f
. -
.
.

51
( )
3
3 50
3n
Class Q 37 5
4 4
. = = =

Class Q
3
is the 4
th
class

Therefore,
3
3
3
4
37 5 34
30 5 10
9
34 3889
| |
|
= +
|
|
\ .
| |
= +
|
\ .
=
Q
Q
n
- F
Q L i
f
. -
.
.

2. Interquartile Range

IQR = Q
3
Q
1

Example 32:

Refer to example 31, calculate the IQR.

Solution:

IQR = Q
3
Q
1
= 34.3889 13.7143 = 20.6746

52
2.7 Measure of Skewness

- To determine the skewness of data (symmetry, left skewed,
right skewed)
- Also called Skewness Coefficient or Pearson Coefficient of
Skewness

- If S
k
+ve right skewed
- If S
k
-ve left skewed
- If S
k
= 0
- If S
k
takes a value in between (-0.9999, -0.0001) or (0.0001,
0.9999) approximately symmetry.

Example 33

The duration of cancer patient warded in Hospital Seberang Jaya recorded
in a frequency distribution. From the record, the mean is 28 days, median
is 25 days and mode is 23 days. Given the standard deviation is 4.2 days.
a. What is the type of distribution?
b. Find the skewness coefficient

Solution:

This distribution is right skewed because the mean is the largest value

( ) ( )
28 23
11905
4 2
3 3 28 25
21429
4 2
Mean - Mode
OR
Mean - Median

= = =

= = =
k
k
S .
s .
S .
s .

So, from the S
k
value this distribution is right skewed.
s
Mode Mean
S
or
s
Mode Mean
S
k
k
) ( 3
=

=
53
Exercise 2:

1. A survey research company asks 100 people how many times they have been to
the dentist in the last five years. Their grouped responses appear below.

Number of Visits Number of Responses
0 4 16
5 9 25
10 14 48
15 19 11
What are the mean and variance of the data?

2. A researcher asked 25 consumers: How much would you pay for a television
adapter that provides Internet access? Their grouped responses are as follows:

Amount ($) Number of Responses
0 99 2
100 199 2
200 249 3
250 299 3
300 349 6
350 399 3
400 499 4
500 999 2
Calculate the mean, variance, and standard deviation.

3. The following data give the pairs of shoes sold per day by a particular shoe store
in the last 20 days.
85 90 89 70 79 80 83 83 75 76
89 86 71 76 77 89 70 65 90 86

Calculate the
a. mean and interpret the value.
b. median and interpret the value.
c. mode and interpret the value.
d. standard deviation.

54
4. The followings data shows the information of serving time (in minutes) for 40
customers in a post office:

2.0 4.5 2.5 2.9 4.2 2.9 3.5 2.8
3.2 2.9 4.0 3.0 3.8 2.5 2.3 3.5
2.1 3.1 3.6 4.3 4.7 2.6 4.1 3.1
4.6 2.8 5.1 2.7 2.6 4.4 3.5 3.0
2.7 3.9 2.9 2.9 2.5 3.7 3.3 2.4

a. Construct a frequency distribution table with 0.5 of class width.
b. Construct a histogram.
c. Calculate the mode and median of the data.
d. Find the mean of serving time.
e. Determine the skewness of the data.
f. Find the first and third quartile value of the data.
g. Determine the value of interquartile range.

5. In a survey for a class of final semester student, a group of data was obtained for
the number of text books owned.

Number of
students
Number of text
book owned
12
9
11
15
10
8
5
5
3
2
1
0

Find the average number of text book for the class. Use the weighted mean.

6. The following data represent the ages of 15 people buying lift tickets at a ski
area.

15 25 26 17 38 16 60 21
30 53 28 40 20 35 31

Calculate the quartile and interquartile range.

7. A student scores 60 on a mathematics test that has a mean of 54 and a standard
deviation of 3, and she scores 80 on a history test with a mean of 75 and a
standard deviation of 2. On which test did she perform better?

55
8. The following table gives the distribution of the shares price for ABC Company
which was listed in BSKL in 2005.

Price (RM) Frequency
12 14
15 17
18 20
21 23
24 26
27 - 29
5
14
25
7
6
3

Find the mean, median and mode for this data.

Unit-5 Curve Fitting by Numerical Method
100% (2)
Unit-5 Curve Fitting by Numerical Method
10 pages
Frequency Distribution & Graphs
No ratings yet
Frequency Distribution & Graphs
39 pages
Unit 2 - Summarizing Data - Charts and Tables
100% (1)
Unit 2 - Summarizing Data - Charts and Tables
33 pages
Data Presentation
100% (1)
Data Presentation
31 pages
Chapter 2. Presenting Data in Tables and Charts: Objectives
No ratings yet
Chapter 2. Presenting Data in Tables and Charts: Objectives
44 pages
Nota Pengantar Statistik Bab 2
80% (5)
Nota Pengantar Statistik Bab 2
49 pages
Chapter 2 Methods of Data Collection and Presentation
No ratings yet
Chapter 2 Methods of Data Collection and Presentation
35 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
40 Multiple Choice Questions in Basic Statistics
No ratings yet
40 Multiple Choice Questions in Basic Statistics
8 pages
A Method of Calculation of Ship Resistance On Calm Water Useful at Preliminary Stages of Ship Design
100% (1)
A Method of Calculation of Ship Resistance On Calm Water Useful at Preliminary Stages of Ship Design
6 pages
Chapter 2.1 - SQQS1013
No ratings yet
Chapter 2.1 - SQQS1013
72 pages
Organizing and Graphing Data - Francheska G. Alviz
No ratings yet
Organizing and Graphing Data - Francheska G. Alviz
13 pages
Lecture-2 & 3
No ratings yet
Lecture-2 & 3
94 pages
CH - 2 (Organizing and Graphing Data)
No ratings yet
CH - 2 (Organizing and Graphing Data)
83 pages
Chapter 1 Describing Data
No ratings yet
Chapter 1 Describing Data
219 pages
CH02 - Data Description 2
No ratings yet
CH02 - Data Description 2
85 pages
Share Report in Elementary Statistics and Probability - 1
No ratings yet
Share Report in Elementary Statistics and Probability - 1
72 pages
SQQS1013 Ch2 A122
No ratings yet
SQQS1013 Ch2 A122
44 pages
Raw Data:: Organizing and Summarizing Data
100% (1)
Raw Data:: Organizing and Summarizing Data
5 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
65 pages
2-Organizing and Displaying Data
No ratings yet
2-Organizing and Displaying Data
65 pages
Chapter 2 Summarising Data
No ratings yet
Chapter 2 Summarising Data
13 pages
Data Organization
No ratings yet
Data Organization
69 pages
Week 2.1 Data Presentation
No ratings yet
Week 2.1 Data Presentation
40 pages
CH 2 Processing and Representing Data
No ratings yet
CH 2 Processing and Representing Data
96 pages
Unit-2 3
No ratings yet
Unit-2 3
76 pages
Lecture 2
No ratings yet
Lecture 2
41 pages
Statistics in Research 2018
No ratings yet
Statistics in Research 2018
8 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
2 Organizing and Visualizing Variables
No ratings yet
2 Organizing and Visualizing Variables
36 pages
Organizing-Data 250120 180858
No ratings yet
Organizing-Data 250120 180858
32 pages
Chapter 2 SQQS1013
No ratings yet
Chapter 2 SQQS1013
44 pages
Chapter 2
No ratings yet
Chapter 2
52 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Essentials of Biostatistics and Research
0% (1)
Essentials of Biostatistics and Research
6 pages
Chapter
No ratings yet
Chapter
33 pages
Organizing and Graphing Data
No ratings yet
Organizing and Graphing Data
83 pages
Topic 3
No ratings yet
Topic 3
22 pages
Unit1 - 2charts and Graphs
No ratings yet
Unit1 - 2charts and Graphs
26 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
63 pages
sqqs1013 chp02
No ratings yet
sqqs1013 chp02
43 pages
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
No ratings yet
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
47 pages
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
No ratings yet
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
21 pages
Presentation of Data
No ratings yet
Presentation of Data
20 pages
Chapter 2 Describing Data Using Tables and Graphs
No ratings yet
Chapter 2 Describing Data Using Tables and Graphs
16 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
34 pages
2 Frequency Distribution and Graphs
0% (1)
2 Frequency Distribution and Graphs
4 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
Business Statistics For R: Name PRN
No ratings yet
Business Statistics For R: Name PRN
30 pages
SG ch02
No ratings yet
SG ch02
44 pages
EDA ES 214 Module 1
No ratings yet
EDA ES 214 Module 1
9 pages
Statanalysis C2a
No ratings yet
Statanalysis C2a
6 pages
Module 3 Data Presentation
No ratings yet
Module 3 Data Presentation
9 pages
Chapter1 Data Description PDF
No ratings yet
Chapter1 Data Description PDF
24 pages
LECTURE 4 13022025 042101pm
No ratings yet
LECTURE 4 13022025 042101pm
10 pages
Chapter 2-190810 074149
No ratings yet
Chapter 2-190810 074149
19 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
What Is Raw Data?
No ratings yet
What Is Raw Data?
8 pages
CH 2 Notes Filled
No ratings yet
CH 2 Notes Filled
22 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Math 140 Chapter 2 Notes
No ratings yet
Math 140 Chapter 2 Notes
5 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Lecture 4 - Graphing Data Adjusted
No ratings yet
Lecture 4 - Graphing Data Adjusted
5 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Community Project: ANCOVA (Analysis of Covariance) in SPSS
No ratings yet
Community Project: ANCOVA (Analysis of Covariance) in SPSS
4 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Curve Fitting: ME 537 Numerical Methods For Engineers University of Gaziantep Faculty of Engineering Dr. Mustafa Özakça
No ratings yet
Curve Fitting: ME 537 Numerical Methods For Engineers University of Gaziantep Faculty of Engineering Dr. Mustafa Özakça
171 pages
Unit-5 Bda
No ratings yet
Unit-5 Bda
21 pages
Analysis of Variance
No ratings yet
Analysis of Variance
132 pages
Discriminant Analysis
0% (1)
Discriminant Analysis
16 pages
Statistics - Linear Regression - Correlation Worksheet PDF
No ratings yet
Statistics - Linear Regression - Correlation Worksheet PDF
2 pages
Statistics Solutions Class 11
No ratings yet
Statistics Solutions Class 11
41 pages
Supplementary Readings For Reliability, Validity, Utility
No ratings yet
Supplementary Readings For Reliability, Validity, Utility
8 pages
Ngao Duncan Muia MOD 2023
No ratings yet
Ngao Duncan Muia MOD 2023
126 pages
Convergent and Discriminant Validity
No ratings yet
Convergent and Discriminant Validity
13 pages
7 - Monte-Carlo-Simulation With XL STAT - English Guideline
No ratings yet
7 - Monte-Carlo-Simulation With XL STAT - English Guideline
8 pages
HW 7 Solutions
No ratings yet
HW 7 Solutions
7 pages
Wa0013.
No ratings yet
Wa0013.
4 pages
COSM - Lesson Plan (CSE)
No ratings yet
COSM - Lesson Plan (CSE)
4 pages
Describe Machine Learning Lifecycle
No ratings yet
Describe Machine Learning Lifecycle
4 pages
Ema Riboon
No ratings yet
Ema Riboon
1 page
Anova
No ratings yet
Anova
8 pages
DSBDAL - Assignment No 6
No ratings yet
DSBDAL - Assignment No 6
4 pages
Correction of Measurement Error - Part 1
No ratings yet
Correction of Measurement Error - Part 1
22 pages
Standard Deviation
No ratings yet
Standard Deviation
8 pages
Excel Spreadsheet For Response Surface Analysis
No ratings yet
Excel Spreadsheet For Response Surface Analysis
3 pages
Question Set of Statistics P7 Deb Sir
No ratings yet
Question Set of Statistics P7 Deb Sir
3 pages
Practice Questions 3
No ratings yet
Practice Questions 3
2 pages
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet

Chapter2 091117004812 Phpapp01

Uploaded by

Chapter2 091117004812 Phpapp01

Uploaded by

QQS1013

Frequency of that class

= the sum af all values

is calculated in the following table:

You might also like