Chapter 2
Organizing Data
Learning Outcome
Organize data in an intelligible manner from a
probabilistic and statistical point of view.
Use statistical software to do simple
exploratory data analysis.
Contents
2.1 Some Definitions.
2.2 Organizing and Graphing Qualitative Data.
2.2.1 Frequency distributions for qualitative data
2.2.2 Relative frequency and percentage distributions
2.2.3 Graphical presentation of qualitative data
2.3 Organizing and Graphing Quantitative Data
2.3.1 Small data set
Dotplot displays
Stem-and-leaf displays
2.3.2 Single-Valued Classes
2.3.3 Grouped data
Frequency Distribution for quantitative data
Relative frequency and percentage distributions
Histogram
Polygon
2.3.4 Cumulative frequency distribution
Ogive / Cumulative frequency curve
Discrete
Quantitative -Data can only take finite number
-Take certain values with no intermediate values
- Numerical e.g. number of trials before success
- Measurable
Continuous
-Data can take any value within range
Variable e.g. height 167.5cm
Nominal
- Refers to Characteristic
Qualitative
e.g. gender, race, yes/no
-Categorical
-Not measurable Ordinal
-Can be related in order
e.g. preference, importance
2.1 Some Definitions
Raw Data
Data in the sequence in which they are collected
and before they are processed or ranked.
Examples:
Table 1: The weights of 20 students in kg (Quantitative
raw data)
61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71
2.1 Some Definitions (cont'd)
Raw Data (cont'd)
Table 2: The grades of UCCM 1213 of 20 students
(Qualitative raw data)
A B C A C B B A B C
B A B B B A C D D B
2.1 Some Definitions (cont'd)
Arrays
An arrangement of numerical raw data in
ascending order or descending order of
magnitude.
Example:
60 61 62 63 64 65 65 66 67 67
68 68 69 69 70 70 71 71 73 74
2.1 Some Definitions (cont'd)
Ungrouped Data
Contains information on each member of a
sample or population individually.
Examples:
Table 1 and Table 2.
Grouped Data
Data presented in classes or intervals.
Example:
UCCM1213 10 – 12 13 – 15 16 – 18 19 – 21
Scores
Number of 4 12 20 14
students
2.2 Organizing and Graphing
Qualitative Data
2.2.1 Frequency Distribution
2.2.2 Relative Frequency and
Percentage
Distributions
2.2.3 Graphical Presentation
Frequency Distribution for
Qualitative Data
A tabular arrangement that lists all categories and
the number of elements that belong to each of the
categories.
Example 2.1
A sample was taken of 25 students who were planning to go to college.
The courses he/she intended to choose:
Engineering Infotech Engineering Business Business
Business Business Other Biotech Biotech
Biotech Biotech Infotech Biotech Biotech
Other Business Engineering Business Other
Engineering Biotech Biotech Other Infotech
Construct a frequency distribution table for these data.
Qualitative Data
Frequency Distribution for Qualitative Data
(cont'd)
Solution:
Course Tally Frequency
Biotech \\\\ \\\
Business
Engineering
Infotech
Others
Total:
Qualitative Data
Frequency Distribution for
Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
1st drag
2nd drag
Qualitative Data
Frequency Distribution for Qualitative Data
(Excel)
Qualitative Data
Relative Frequency and Percentage
Distributions
Tabular arrangement that lists the relative
frequencies and percentages for all
categories
frequency of the category
Relative frequency of a category
sum of all frequencies
f
f
Percentage Relative Frequency 100%
Qualitative Data
Relative Frequency and Percentage
Distributions (cont’d)
Example 2.2
Determine the relative frequency and percentage distributions
for the data in Example 2.1.
Solution:
Qualitative Data
Graphical Presentation of Qualitative Data
Bar Chart (bar graph)
A graph made of bars whose heights represent the
frequencies of respective categories .
Example 2.3
Construct a bar chart for the data in
Example 2.1.
Qualitative Data
Example 2.3
Qualitative Data
Example 2.4
Qualitative Data
Graphical Presentation of Qualitative Data
Pie Chart
A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories.
Starts at 12 o’clock
Moves clock wise from largest sector to the
smallest.
Qualitative Data
Example 2.4
Qualitative Data
Using R
Install R from www.r-project.org
R is a computer language for statistical
computing and it is an open-source software.
Qualitative Data
Qualitative Data
Bar chart (R)
Qualitative Data
Bar chart (R)
Majors
8
6
No. of students
4
2
0
Biotech Business Engineering Infotech others
Qualitative Data
Pie chart (R)
Qualitative Data
Pie chart (R)
majors
Biotech
Business
others
Engineering
Infotech
Qualitative Data
2.3 Organizing and Graphing
Quantitative Data
Small data set
Dotplot
Stem-and-leaf displays
Single-Valued Classes
Grouped data
Frequency Distribution for quantitative data
Relative frequency and percentage distributions
Histogram
Polygon
Cumulative frequency distribution
Ogive / Cumulative frequency curve
Quantitative Data
Small data set
• Dotplot display
• Stem and leaf display
Dotplot display
Displays the data of a sample by
representing each piece of data with a dot
positioned along a scale (horizontal scale or
vertical scale).
The frequency of the values is represented
along the other scale.
Example 2.5
Dotplot display
A sample of 19 exam grades was randomly
selected from a large class:
767482966676787252688684627678
92827488
Construct a dotplot of these data.
Dotplot display
Solution (Example 2.5)
60 70 80 90
exam grades
Stem and leaf display
Each value is divided into two portions:
a stem & a leaf. The leaves for each stem are
shown separately in a display.
Note:
It is constructed only for quantitative data.
An advantage over a frequency distribution
because we do not lose information on individual
observations.
Female Male 11 | 179
12 | 57789
26 | 3 | 179 13 | 367
1145 | 4 | 57789
06889 | 5 | 367 Key 11 | 1 = 11.1
026 | 6 | 7
Stem and leaf display (cont’d)
1) Arrange the data in order
2) Separate the data according to the
classes (first digit)
3) Construct a plot that the leading digit as
stem and the trailing digit as leaf.
Example 2.6
The following are the scores of 30 college
students on a statistics test.
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98
Construct a stem and leaf display for these
data.
Solution (Example 2.6)
Put the data in order:
50 52 57 61 64 65 68 69 71 71 72 72 75 76 77 79 79
80 81 83 84 86 87 87 92 92 93 95 96 98
• The distribution peaks in
the center and that there
are no gaps in the data.
• For 9 of the 30 students,
the scores were between
71 and 79.
• Minimum score of 50 to
a maximum score of 98
Key 5|0 =50 on the statistics test.
Example 2.7
The following data are the monthly rents paid by a
sample of 30 households selected from a city.
429 585 732 675 550 989 1020
620 750 660 540 578 956 1030 1070
930 871 765 880 975 650 1020 950
840 780 870 900 800 750 820
Construct a stem-and-leaf display for these data.
Solution (Example 2.7)
Single-Valued Classes
Single-valued classes is used if the
observations in a data set assume only a few
distinct values (classes that are made of
single values and not of intervals).
It is useful in cases of discrete data with only
a few possible values.
Single-Valued Classes (cont’d)
Example 2.8
A sample of 40 randomly selected households from a
city produced the following data on the number of
vehicles owned:
Construct a frequency distribution table for these data.
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Vehicles owned Number of households
(f)
0 2
1 18
2 11
3 4
4 3
5 2
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Example 2.8 (Excel)
Frequency Distribution for
quantitative data
Class
An interval that includes all the values that falls
within two numbers, the lower and upper limits.
Class limits
Endpoints of each interval.
Class Boundary
The dividing line between two classes. It is given
by the midpoint of the upper limit of one class and
the lower limit of the next higher class.
Grouped Data
Frequency Distribution for
quantitative data (cont'd)
Class width / class size
Class width is the difference between the upper
and lower class boundary.
Class width Upper boundary - Lower boundary
Class mark / class midpoint
Class mark is the midpoint of the class interval.
(Lower class limit Upper class limit)
Class mark
2
Grouped Data
HIV Positive Cases in Malaysia by age groups
Age No of HIV Class Boundaries Class Class
group +ve Cases Width Midpoint
2-12 532 1.5 to less than 12.5 11 7
13-19 1140
20-29 27995
30-39 34770
40-49 12580
Fourth class = Lower boundary of 3rd class =
Lower limit of 3rd class = Upper boundary of 3rdclass =
Upper limit of 3rd class = 19.5-12.5 = (13 + 19) / 2 =
Constructing frequency distribution tables
Determine the number of classes,
usually varies from 5 to 20, depending mainly on the number
of observations in the data set.
Find 2k where k is the smallest number such that 2k is
greater than the number of observations (n).
Determine the class interval or width ( i ) Must cover at
least the distance from the smallest value (L) in the raw data
up to the largest value (H).
Largest value(H) -Smallest value(L)
Approximate class width
number of classes
Grouped Data
Constructing frequency distribution tables (cont'd)
Determine the lower limit of the first class or
the starting point.
Any convenient number that is equal to or less
than the smallest value in the data set can be
used as the lower limit of the first class.
Grouped Data
Example 2.9
The data gives the 1999 total payrolls
(rounded to millions) for all 30 major league
baseball teams.
Construct a frequency distribution table.
Example 2.9 (Cont’d)
Team Total Payroll Team Total Payroll
(million dollars) (million dollars)
Anaheim 51 Milwaukee 43
Arizona 70 Minnesota 16
Atlanta 79 Montreal 15
Baltimore 75 New York Mets 72
Boston 72 New York Yankees 92
Chicago Clubs 55 Oakland 25
Chicago White Sox 25 Philadelphia 30
Cincinnati 38 Pittsburgh 24
Cleveland 74 St. Louis 46
Colorado 54 San Diego 47
Detroit 37 San Francisco 46
Florida 15 Seattle 45
Houston 56 Tampa Bay 38
Kansas City 17 Texas 81
Los Angeles 77 Toronto 49
Solution (Example 2.9)
24=16, 25=32, so k = 5.
Min=15, Max=92.
92 15
Approximate class width 15.4,
5
So, class width is 15 units.
Solution (Example 2.9)
Total Payroll Tally f
(million
dollars)
15-29
30-44
45-59
60-74
75-89
90-104
Relative frequency and
Percentage distributions
Relative Frequency of a class
Frequency of that class f
sum of all frequencies f
Percentage Relative Frequency 100%
Example 2.10
Calculate the relative frequencies and
percentages distributions for the data in Example
2.9.
Solution (Example 2.9)
Total Payroll Tally f Class Relative Percentage
(million Boundaries Frequency
dollars)
15-29
30-44
45-59
60-74
75-89
90-104
Histogram and Polygon
Grouped (quantitative) data can be displayed
in a histogram or a polygon.
Histogram
Three types of histogram
Frequency histogram
Relative frequency histogram
Percentage histogram
Histogram
Procedures to draw a histogram:
Mark the class boundary of each interval on the
horizontal axis.
For each class, mark the frequencies (or relative
frequencies or percentages) on the vertical axis.
Draw a bar for each class so that its height
represents the frequency of that class. (No gap
between each bars)
Label the histogram.
Histogram (cont'd)
A frequency histogram consists of a set of rectangle
having
The bases on a horizontal axis with centres at the class
marks and lengths equal to the class interval sizes.
The areas proportional to the class frequencies.
If the class intervals all have equal size
the height of the rectangles are proportional to the class
frequencies.
otherwise
the height of the rectangles must be adjusted:
Standard Class Width
Adjusted Frequency Frequency
Class Width
Histogram and Polygon (cont'd)
Example 2.10
The frequency distribution gives the weight of
35 objects, measured to the nearest kg. Draw
a histogram to illustrate the data.
Weight (kg) 6–8 9 – 11 12 – 17 18 – 20 21 – 29
Frequency 4 6 10 3 12
Histogram and Polygon (cont'd)
Solution:
Standard Class Width
Adjusted Frequency Frequency
Class Width
Weight Class Frequency Height of rectangle
(kg) width (adjusted frequency)
6–8 3 4 4
9 – 11 3 6 6
12 – 17 6 10
18 – 20 3 3
21 – 29 9 12
Adjusted Frequency
0 Weight (kg)
5·5 8.5 11.5 16.5 20.5 29.5
Shape of Histogram
Symmetric Symmetric
30 12
25 10
20 8
15 6
10 4
5 2
0 0
Shape of Histogram
Bell Shape Uniform
12 12
10 10
8 8
6 6
4 4
2 2
0 0
Shape of Histogram
Left Skewed Right Skewed
20 20
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
Shape of Histogram
Unimodal Bimodal
18 10
9
16
8
14
7
12
6
10
5
8
4
6
3
4 2
2 1
0 0
Polygon
Polygon is a line graph formed by joining the
midpoints of the tops of successive bars in a
histogram.
Next, we mark two more classes (with zero
frequencies), one at each end, and mark the
midpoints.
Three types of polygon:
Frequency polygon
Relative frequency polygon
Percentage polygon
Example 2.11
Weight (kg) Class No of
mark students
60 – 62 61 3
63 – 65 64 4
66 – 68 67 5
69 – 71 70 6
72 – 74 73 2
Histogram Polygon
7 7
6 6
5 5
No. of student
No. of student
4 4
3 3
2 2
1 1
0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76
Weight (kg) Weight (kg)
Example 2.12
Weight (kg) Class mark Frequency Relative frequency Percentage
60 – 62 61 3 0.15 15
63 – 65 64 4 0.2 20
66 – 68 67 5 0.25 25
69 – 71 70 6 0.3 30
72 – 74 73 2 0.1 10
f = 20 1.00 100 %
Relative Frequency
Relative Frequency Histogram Relative Frequency Polygon
0.35 0.35
0.3 0.3
0.25 0.25
Relative frequency
Relative frequency
0.2 0.2
0.15 0.15
0.1
0.1
0.05
0.05
0
0
58 61 64 67 70 73 76
58 61 64 67 70 73 76
Weight (kg)
Weight (kg)
Percentage
Percentage Histogram Percentage Polygon
35 35
30 30
25 25
Relative frequency
Relative frequency
20 20
15 15
10 10
5 5
0 0
58 61 64 67 70 73 76 58 61 64 67 70 73 76
Weight (kg) Weight (kg)
Cumulative frequency distribution
A table that presents the total number of
values that fall below the upper boundary of
each class.
It is constructed for quantitative data only.
Cumulative Relative Frequency
Cumulative Frequency of that class
sum of all frequencies in the data set
Cumulative Percentage Cumulative Relative Frequency 100%
Example 2.13
Weight f Weight (kg) Cumulative frequency
(kg) < 59.5
60 – 62 3
< 62.5
63 – 65 4
< 65.5
66 – 68 5
< 68.5
69 – 71 6
< 71.5
72 – 74 2
< 74.5
Example 2.14
Weight (kg) Cumulative relative Cumulative percentage
frequency
Ogive /
Cumulative frequency curve
A curve drawn for the cumulative frequency
distribution by joining the dots marked above the
upper boundaries of classes at heights equal to the
cumulative frequencies of respective classes.
Note:
1. The ogive starts at the lower boundary of the first
class and ends at the upper boundary of the last class.
2. If relative cumulative frequency is used in place of
cumulative frequency, the graph is called relative
cumulative frequency curve or percentage ogive.
Example 2.15
Draw an ogive for the data in Example 2.13.
Estimate from the ogive,
a) the total number of students that their
weight were less than 68.3kg.
b) the value of X ,if 20 % of the total number
of students that their weight were X kg or
more.
Cumulative frequency curve
25
Cum ulative Frequency
20
15
10
0
59.5 62.5 65.5 68.5 71.5 74.5
Weight (kg)
Cumulative frequency curve /
Ogive (cont'd)
Solution:
a)
b)
The End
Chapter 2