02 Organizing Data

^jlkker lof ^jlkker Narnkes Loc sthryl oc pekaiukl ly pltuocihk sl flklwloc plraoc Ilthkaih, ol saolGltmers Cus ^leoz lt Berhje Kunerh, ol cujlcljat oc iloakloc ilflku`mlsllo àklocjcl ajèstaclfhr uploc jlsu`ly`lylo loc asloc seralk iakker sl Tlyltls sl Jlyoakl. Loc seralk iakker ol ath ly pujlpltly oc jcl `ltloc klklia ol olsl eflf ol klàoc-flklwl mloccloc klàoc-lplt lt taoltlocclk loc iloakloc juiml, push, lt lra ololclclolp tuwaoc uoloc ^l`lfh ilfl ùwlo. Loc plra sl Tlyltls ol sa Glt

Uploaded by

Benjito Dominico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views6 pages

02 Organizing Data

Uploaded by

Benjito Dominico

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Organizing Data

When a set of data is obtained, the data is in its raw form. To be able to make sense of the data, data handlers
must organize it in some meaningful way such as arranging the data in an array. The most convenient method
of organizing data is to construct a tally of scores and a frequency distribution. Consider the illustration
described below of the ages of the top 50 wealthiest people in the world according to Forbes Magazine.

Looking simply at the data gives little information of what these numbers may represent and thus little use can
be derived from it. The succeeding sections organize the data into an array, then a tally, then organized in a
frequency distribution.

Array
An array is an arrangement in ascending or descending order. The above data is shown below organized in
ascending order.

The data now is less chaotic than its raw form. Minimal information can now be derived from the organized data.
Information such as the lowest value and the highest value of the set can clearly be seen which gives us an idea
of the overall range of the ages of the 50 people in the set.

Tally
A tally is an act of identifying all the unique values in a given data set and taking note of the frequency of
occurrence of those items within the set. Ultimately, the result of tallying is the frequency of an item within the
given data. Shown below is the tally of items of the above data and the resulting frequencies.
Note that the tallying has identified 32 unique items in the data set and their frequencies of occurrence within
the set. Although we can already derive meaningful information from a tally of scores, we can see that a tally
will easily get bigger and wider for large data sets. To make the table of tally more manageable we can convert
a table of tally into a frequency distribution table. This will be illustrated next.

Frequency Distribution
As mentioned above, the most convenient method of organizing data is by using a frequency distribution. This
is a method where the items from the raw data are organized in table form using classes and frequencies.
Consider the frequency distribution table shown below for the data on the ages of the 50 wealthiest people of
the world.

The first thing to note is that a frequency distribution table has several parts, namely the class intervals,
the frequencies, the class size, the class limits, and the class boundaries. Each of these is defined below.
Class Intervals - These are the ranges of scores to which the occurrence of items from the data set is
categorized.
Class Limits - The class limits are the numbers that define the range of the class intervals. In the above example,
the column comprising the numbers 35, 42, 49, 56, 63, 70, 77, and 84, that is the left column of the class intervals
are the lower-class limits (LCL) of the frequency distribution. While the numbers 41, 48, 55, 62, 69, 76, 83, and
90, that is the right column of the class intervals are the upper-class limits (UCL) of the frequency distribution.
Class Size (𝒊) - The thickness or width of a class interval. The class size can be deduced from a given frequency
distribution by selecting any of the class intervals and counting how many numbers is a range of a class interval,
counting from the lower-class limit of the range to the upper-class limit of the range. In the above example, we
can select any class interval, say 63 - 69, counting the numbers 63, 64, 65, 66, 67, 68, and 69, we see that there
are 7 numbers in the range. Therefore, the class size is 7.
Class Boundaries - The class boundaries are the transitionary numbers between each class interval and are
numbers halfway between the upper-class limit of one class interval and the lower-class limit of the next class
interval. The class boundaries are not usually written along with the construction of a frequency distribution
table but are understood to be part of the table nonetheless. In the above example, we could choose any two
consecutive class intervals, say 49 - 55 and 56 - 62 and take the midpoint between the numbers 55 and 56, which
is (55 + 56)/2 = 55.5. This number is a class boundary between the two given classes. Shown below is the
complete set of class boundaries for the above frequency distribution.

Notice in the above table that we also have two sets of class boundaries, the lower-class boundaries (LCB) and
the upper-class boundaries (UCB). The class boundaries serve as references for categorizing numbers with
decimal places into any of the class intervals. For example, a number such as 48.35 will be counted into the
range 42 - 48 not to the range 49 - 55.
Frequency (𝒇) - The number of occurrence of scores that fall into each class interval. In the above example, the
entries in the frequency column are derived from the tallying of scores. You could verify this by comparing the
entries with the above tally and summing up the frequency of scores occurring within a given range of intervals.
Once data is organized in a frequency distribution, some general observations can be pointed out such as - that
the majority of the 50 wealthy people is over 55 years old. Statistical measures such as the measures of central
tendencies, measures of variabilities, and measures of location can be done on the data when organized in a
frequency distribution table.
Finally, a frequency distribution could be a grouped frequency distribution, just as shown in the above example,
where the class intervals are number ranges, or it can be a categorical frequency distribution where the groups
are qualitative categories with no implied rank or order. Below is a set of blood types of 25 patients in a hospital.

Below is the categorical frequency distribution of the above data on blood types.
Steps in Constructing a Frequency Distribution Table
Note that each step below is accompanied by how each is applied to the example of the ages of the 50 wealthiest
people in the world according to Forbes Magazine.

1. Organize the data into an array (optional). The purpose of arranging the data into an array is for you to
identify the lowest score (LS) and the highest score (HS) of the data set.
• LS = 37 and HS = 90
2. Compute for the range (R) of scores. The range is computed as R = HS - LS.
• R = 90 - 37 = 53
3. Compute for the suggested number of class intervals 𝑘. using the formula 𝑘 = 1 + 3.3 𝑙𝑜𝑔 𝑁, where 𝑁
is the number of items in the data set and 𝑙𝑜𝑔 𝑁 is the logarithm of 𝑁. Round off the result to the nearest
whole number. Take note that this result is but a suggested value and can be adjusted to fit the range of
scores in the data set.
• 𝑘 = 1 + 3.3 𝑙𝑜𝑔 50 = 6.606601014 ≈ 7
4. Compute for the class size (𝑖) using 𝑖 = 𝑅/𝑘. Round off the result to the nearest whole number. As with
the computed class interval, the value for the class size is but a suggestion, serving as a guide as to how
much thickness can be used to construct the class intervals of the frequency distribution. Preferably the
class size should be an odd number so as to avoid computing for midpoints and other auxiliary values
with decimal parts.
• 𝑖 = 53/7 = 7.571428571 ≈ 8
5. Start the construction of the frequency distribution table with the first and lowest among the class
intervals. Note that choosing the starting number for the lower-class limit (LCL) of the first class interval
is arbitrary. The only parameter for choosing the first LCL is that it should contain the lowest score of the
data set. There are several suggestions as to what number does the first class interval should begin with.
Consider each of these suggestions below:
• Start with the lowest score of the data set. For the foregoing example, this would be LS = 37 and
the class interval 37 - 44 (note that the thickness of the interval is 8, counting from 37, 38, 39, 40,
41, 42, 43, and 44 because the class size i = 8); or
• Start with the number immediately lower than the lowest score of the data set that is a multiple
of the class size. For the foregoing example, this would be 32 which is a multiple of 8, and the
class interval 32 - 39 (this interval is not necessarily better or inferior to the interval 37 - 44); or
• Start with the number immediately lower than the lowest score of the data set which is a multiple
of 5 or a multiple of 10. For the foregoing example, these would be 35 with the class interval 35 -
42 and 30 with the class interval 30 - 37, respectively.
6. Continue constructing the class intervals until eventually, you have covered the highest score of the data
set. Indicate the frequency of each of the class intervals based on the tallying of scores in the data set.
• If all the suggestions above are applied to the foregoing example and using the lowest score as
the starting number for the first class interval, then the resulting frequency distribution will be like
the one shown below.

Note that in the frequency distribution shown above, we can remove the last class interval since no scores are
categorized in this last range. The modified version of the above frequency distribution is shown below.
Note that the above frequency distribution table is not the one presented in the main discussion of the material.
Below is the frequency distribution used in the main discussion posted for easy comparison.

The obvious difference between the two frequency distributions is that one has fewer class intervals than the
other. This is due to one using a larger class size and thus resulting in a more congested frequency distribution.
In general, the larger the class size the more compact the frequency distribution having fewer class intervals,
and the smaller the class size the more spread out the frequency distribution having more class intervals. Which
of these schemes is better? Neither, it all boils down to use and which scheme will better convey the information
the data represent when presented to the audience. But as mentioned above, a frequency distribution with the
odd-numbered class size is preferable computation wise.
Test your understanding

Raw Data N = 160

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 101 91 106 116 107 111 99 110 112 105 129 108 131 121 93 105 124 110 91 89 99 107 120 105 95 106 99 89 84 84
2 94 105 96 80 97 87 86 94 106 105 75 101 102 100 89 99 93 84 89 95 82 107 100 111 119 100 105 117 98 123
3 109 85 98 102 104 96 110 90 106 111 103 102 101 116 116 101 112 110 104 118 97 107 107 117 99 95 107 110 106 112
4 92 110 102 101 108 99 94 108 87 100 105 86 106 83 105 103 81 107 81 96 94 90 89 82 88 87 102 91 108 109
5 103 89 90 109 80 106 107 80 105 106 94 82 101 97 83 82 99 102 100 101 112 98 106 102 95 96 96 77 101 94
6 91 113 92 92 109 85 107 105 107 94

1. Construct a frequency distribution table for the given raw data following the suggested steps
presented in the module.
A. What is the highest score of the distribution? the lowest score?
B. What is the range of the distribution?
C. Compute for the suggested number of class intervals (round to the nearest whole number).
D. Compute for the suggested class size (round to the nearest whole number).
E. Start the first class interval with the given lowest score.
F. Tally the scores and indicate the frequency of each class interval.

2. Construct another frequency distribution table for the given raw data using a class size of 5 and
starting the class interval with the lowest score of the data set.

Pangasinan
67% (6)
Pangasinan
2 pages
UDSM Statistics and Probability For Non-Majors
No ratings yet
UDSM Statistics and Probability For Non-Majors
148 pages
An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
An Overview of Business Intelligence, Analytics, and Data Science
40 pages
Pangading Rosaryung Capampangan QNG Casantusantusang Birhen Ning Santo Rosaryo
100% (2)
Pangading Rosaryung Capampangan QNG Casantusantusang Birhen Ning Santo Rosaryo
3 pages
Prac Res 1 Format Chapter 1 2
No ratings yet
Prac Res 1 Format Chapter 1 2
9 pages
PNG University of Technology Mathematics & Computer Science Department
100% (1)
PNG University of Technology Mathematics & Computer Science Department
22 pages
Organization and Presentation of Data
50% (2)
Organization and Presentation of Data
55 pages
Lesson 3 Frequency Distribution
No ratings yet
Lesson 3 Frequency Distribution
55 pages
Meaning & Scope of Accounting
0% (1)
Meaning & Scope of Accounting
53 pages
000 Methods of Presentation of Data - Textual and FDT
No ratings yet
000 Methods of Presentation of Data - Textual and FDT
63 pages
Frequency Distribution Math4
100% (2)
Frequency Distribution Math4
14 pages
2.3.1 Frequency Distribution: Disadvantages
No ratings yet
2.3.1 Frequency Distribution: Disadvantages
4 pages
Lesson 2 Frequency Distributions
No ratings yet
Lesson 2 Frequency Distributions
8 pages
Frequency Distribution: A Frequency Distribution Is Constructed For Three Main Reasons
No ratings yet
Frequency Distribution: A Frequency Distribution Is Constructed For Three Main Reasons
15 pages
Sta111 Complete Note
No ratings yet
Sta111 Complete Note
74 pages
PAS 111 Week 5
No ratings yet
PAS 111 Week 5
9 pages
Module 13 - Organizing Data
No ratings yet
Module 13 - Organizing Data
14 pages
Statistics in Education - Made Simple
100% (1)
Statistics in Education - Made Simple
26 pages
BIOSTAT Chapter2
100% (1)
BIOSTAT Chapter2
57 pages
Organisation and Presentation of Data
No ratings yet
Organisation and Presentation of Data
17 pages
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
No ratings yet
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
73 pages
Chapter 5
No ratings yet
Chapter 5
4 pages
Frequency Distribution Lecture 2 3
No ratings yet
Frequency Distribution Lecture 2 3
11 pages
Data Management - Lesson 2
No ratings yet
Data Management - Lesson 2
16 pages
Chapter 18 - Statistics Presentation
No ratings yet
Chapter 18 - Statistics Presentation
44 pages
Grouped Data:: BA 302: Chapter-2 Instructions by Dr. Kishor Guru-Gharana
No ratings yet
Grouped Data:: BA 302: Chapter-2 Instructions by Dr. Kishor Guru-Gharana
8 pages
Chap 3
No ratings yet
Chap 3
6 pages
3 Organizing Data
No ratings yet
3 Organizing Data
20 pages
Graphs
No ratings yet
Graphs
20 pages
Statistics Combine
No ratings yet
Statistics Combine
65 pages
Frequency Distribution 21 A
No ratings yet
Frequency Distribution 21 A
28 pages
Elementary Statistics
No ratings yet
Elementary Statistics
73 pages
18bst5el U2
No ratings yet
18bst5el U2
21 pages
Lesson 2 Frequency Distribution and Graphs
No ratings yet
Lesson 2 Frequency Distribution and Graphs
11 pages
Group 2 Descriptive Statistics
No ratings yet
Group 2 Descriptive Statistics
27 pages
Lacture Note 03 - Frequency Distributions and Graphical Representation
No ratings yet
Lacture Note 03 - Frequency Distributions and Graphical Representation
16 pages
2 LESSON 2 Freq Graphs FQ
No ratings yet
2 LESSON 2 Freq Graphs FQ
21 pages
STA111 Complete Note
No ratings yet
STA111 Complete Note
74 pages
STA112 Week 2 Class Note
No ratings yet
STA112 Week 2 Class Note
102 pages
Frequency Distribution
No ratings yet
Frequency Distribution
4 pages
Adv Stat Data Presentation
No ratings yet
Adv Stat Data Presentation
57 pages
Sta111 Lecture Note 2
No ratings yet
Sta111 Lecture Note 2
6 pages
Lec 01 - Frequency Distribution - Stat - 1
No ratings yet
Lec 01 - Frequency Distribution - Stat - 1
4 pages
Trinitas College: Statistics and Probability Module 7-8 Lesson
No ratings yet
Trinitas College: Statistics and Probability Module 7-8 Lesson
5 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Lesson 3. Frequency Distribution
No ratings yet
Lesson 3. Frequency Distribution
6 pages
Data Presentation
No ratings yet
Data Presentation
19 pages
Module 2 Data Collection
No ratings yet
Module 2 Data Collection
17 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
12 pages
Frequency Distribution
No ratings yet
Frequency Distribution
24 pages
Mt271 Lecture Notes 1
No ratings yet
Mt271 Lecture Notes 1
13 pages
Frequency Distribution
No ratings yet
Frequency Distribution
14 pages
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
No ratings yet
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
45 pages
Lecture 5-Exploring and Making Sense of Data-Deriving Information
No ratings yet
Lecture 5-Exploring and Making Sense of Data-Deriving Information
38 pages
Graphical Representation of Data Statistics p2
No ratings yet
Graphical Representation of Data Statistics p2
35 pages
L1 SK
No ratings yet
L1 SK
2 pages
Module Part 2 Frequency Distribution and Graphs
No ratings yet
Module Part 2 Frequency Distribution and Graphs
38 pages
Module 3 PDF
No ratings yet
Module 3 PDF
24 pages
Business Statistics Chapter 2
No ratings yet
Business Statistics Chapter 2
33 pages
Frequency Distributions: Describing, Exploring and Comparing Data
No ratings yet
Frequency Distributions: Describing, Exploring and Comparing Data
28 pages
8086 Cpu
No ratings yet
8086 Cpu
89 pages
Chapter 1 INTRODUCTION TO DATA
No ratings yet
Chapter 1 INTRODUCTION TO DATA
9 pages
Project Work Sem - III & IV - Word
No ratings yet
Project Work Sem - III & IV - Word
8 pages
Help General DLL Block
No ratings yet
Help General DLL Block
12 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Frequency
100% (1)
Frequency
36 pages
Beliefs and Conviction
No ratings yet
Beliefs and Conviction
16 pages
Unit 4-DBMS
No ratings yet
Unit 4-DBMS
20 pages
Lesson2-1
No ratings yet
Lesson2-1
9 pages
Practical Questions XII 802
No ratings yet
Practical Questions XII 802
5 pages
Exploring Marketing Research: Exploratory Research and Qualitative Analysis
No ratings yet
Exploring Marketing Research: Exploratory Research and Qualitative Analysis
30 pages
Modern Operating Systems, 2nd Edition, Chapter 6 Course Slides
No ratings yet
Modern Operating Systems, 2nd Edition, Chapter 6 Course Slides
46 pages
Quotas
No ratings yet
Quotas
108 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Methodology Master Thesis PDF
100% (3)
Methodology Master Thesis PDF
9 pages
Suhainah Buisan-Eliarda: Objective
No ratings yet
Suhainah Buisan-Eliarda: Objective
3 pages
Patris Corde: Sancti Ioseph
No ratings yet
Patris Corde: Sancti Ioseph
2 pages
Computer Network Slides
No ratings yet
Computer Network Slides
24 pages
Case Study Approach Using Scenario Analysis To
No ratings yet
Case Study Approach Using Scenario Analysis To
8 pages
Performance of Small and Medium Enterprises in Uganda A Case Study of Kabale Municipality South Western Uganda
No ratings yet
Performance of Small and Medium Enterprises in Uganda A Case Study of Kabale Municipality South Western Uganda
10 pages
RWS PT04 Position Paper
No ratings yet
RWS PT04 Position Paper
1 page
Data Analytics Course Overview 6
No ratings yet
Data Analytics Course Overview 6
8 pages
Arcview Legends Tutorial
No ratings yet
Arcview Legends Tutorial
7 pages
Djongo Doc - Djongo Team
No ratings yet
Djongo Doc - Djongo Team
36 pages
Unit 4 BDTT
No ratings yet
Unit 4 BDTT
23 pages
Databases
No ratings yet
Databases
19 pages
Developing A Problem Statement, Purpose Statement, and Research Questions For A Qualitative Study On An Elearning Topic
No ratings yet
Developing A Problem Statement, Purpose Statement, and Research Questions For A Qualitative Study On An Elearning Topic
6 pages
Individual Conference
No ratings yet
Individual Conference
1 page
Instructions
No ratings yet
Instructions
5 pages
Wrapper Classes in Java
No ratings yet
Wrapper Classes in Java
8 pages
31383-Article Text-80643-1-10-20210409
No ratings yet
31383-Article Text-80643-1-10-20210409
9 pages
Os Complete Plan
No ratings yet
Os Complete Plan
3 pages
How Do I Protect My Mental Health?: Doing Exercises After Class
No ratings yet
How Do I Protect My Mental Health?: Doing Exercises After Class
1 page
Jobs Internships
No ratings yet
Jobs Internships
4 pages
Module 10 Evolution
No ratings yet
Module 10 Evolution
1 page
Sutingco, Benjito Macalintal
No ratings yet
Sutingco, Benjito Macalintal
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet

02 Organizing Data

Uploaded by

02 Organizing Data

Uploaded by

Organizing Data

Raw Data N = 160

You might also like