0% found this document useful (0 votes)
113 views6 pages

02 Organizing Data

^jlkker lof ^jlkker Narnkes Loc sthryl oc pekaiukl ly pltuocihk sl flklwloc plraoc Ilthkaih, ol saolGltmers Cus ^leoz lt Berhje Kunerh, ol cujlcljat oc iloakloc ilflku`mlsllo `aklocjcl aj`estaclfhr uploc jlsu`ly`lylo loc asloc seralk iakker sl Tlyltls sl Jlyoakl. Loc seralk iakker ol ath ly pujlpltly oc jcl `ltloc klklia ol olsl eflf ol kl`aoc-flklwl mloccloc kl`aoc-lplt lt taoltlocclk loc iloakloc juiml, push, lt lra ololclclolp tuwaoc uoloc ^l`lfh ilfl `uwlo. Loc plra sl Tlyltls ol sa Glt

Uploaded by

Benjito Dominico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views6 pages

02 Organizing Data

^jlkker lof ^jlkker Narnkes Loc sthryl oc pekaiukl ly pltuocihk sl flklwloc plraoc Ilthkaih, ol saolGltmers Cus ^leoz lt Berhje Kunerh, ol cujlcljat oc iloakloc ilflku`mlsllo `aklocjcl aj`estaclfhr uploc jlsu`ly`lylo loc asloc seralk iakker sl Tlyltls sl Jlyoakl. Loc seralk iakker ol ath ly pujlpltly oc jcl `ltloc klklia ol olsl eflf ol kl`aoc-flklwl mloccloc kl`aoc-lplt lt taoltlocclk loc iloakloc juiml, push, lt lra ololclclolp tuwaoc uoloc ^l`lfh ilfl `uwlo. Loc plra sl Tlyltls ol sa Glt

Uploaded by

Benjito Dominico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Organizing Data

When a set of data is obtained, the data is in its raw form. To be able to make sense of the data, data handlers
must organize it in some meaningful way such as arranging the data in an array. The most convenient method
of organizing data is to construct a tally of scores and a frequency distribution. Consider the illustration
described below of the ages of the top 50 wealthiest people in the world according to Forbes Magazine.

Looking simply at the data gives little information of what these numbers may represent and thus little use can
be derived from it. The succeeding sections organize the data into an array, then a tally, then organized in a
frequency distribution.

Array
An array is an arrangement in ascending or descending order. The above data is shown below organized in
ascending order.

The data now is less chaotic than its raw form. Minimal information can now be derived from the organized data.
Information such as the lowest value and the highest value of the set can clearly be seen which gives us an idea
of the overall range of the ages of the 50 people in the set.

Tally
A tally is an act of identifying all the unique values in a given data set and taking note of the frequency of
occurrence of those items within the set. Ultimately, the result of tallying is the frequency of an item within the
given data. Shown below is the tally of items of the above data and the resulting frequencies.
Note that the tallying has identified 32 unique items in the data set and their frequencies of occurrence within
the set. Although we can already derive meaningful information from a tally of scores, we can see that a tally
will easily get bigger and wider for large data sets. To make the table of tally more manageable we can convert
a table of tally into a frequency distribution table. This will be illustrated next.

Frequency Distribution
As mentioned above, the most convenient method of organizing data is by using a frequency distribution. This
is a method where the items from the raw data are organized in table form using classes and frequencies.
Consider the frequency distribution table shown below for the data on the ages of the 50 wealthiest people of
the world.

The first thing to note is that a frequency distribution table has several parts, namely the class intervals,
the frequencies, the class size, the class limits, and the class boundaries. Each of these is defined below.
Class Intervals - These are the ranges of scores to which the occurrence of items from the data set is
categorized.
Class Limits - The class limits are the numbers that define the range of the class intervals. In the above example,
the column comprising the numbers 35, 42, 49, 56, 63, 70, 77, and 84, that is the left column of the class intervals
are the lower-class limits (LCL) of the frequency distribution. While the numbers 41, 48, 55, 62, 69, 76, 83, and
90, that is the right column of the class intervals are the upper-class limits (UCL) of the frequency distribution.
Class Size (𝒊) - The thickness or width of a class interval. The class size can be deduced from a given frequency
distribution by selecting any of the class intervals and counting how many numbers is a range of a class interval,
counting from the lower-class limit of the range to the upper-class limit of the range. In the above example, we
can select any class interval, say 63 - 69, counting the numbers 63, 64, 65, 66, 67, 68, and 69, we see that there
are 7 numbers in the range. Therefore, the class size is 7.
Class Boundaries - The class boundaries are the transitionary numbers between each class interval and are
numbers halfway between the upper-class limit of one class interval and the lower-class limit of the next class
interval. The class boundaries are not usually written along with the construction of a frequency distribution
table but are understood to be part of the table nonetheless. In the above example, we could choose any two
consecutive class intervals, say 49 - 55 and 56 - 62 and take the midpoint between the numbers 55 and 56, which
is (55 + 56)/2 = 55.5. This number is a class boundary between the two given classes. Shown below is the
complete set of class boundaries for the above frequency distribution.

Notice in the above table that we also have two sets of class boundaries, the lower-class boundaries (LCB) and
the upper-class boundaries (UCB). The class boundaries serve as references for categorizing numbers with
decimal places into any of the class intervals. For example, a number such as 48.35 will be counted into the
range 42 - 48 not to the range 49 - 55.
Frequency (𝒇) - The number of occurrence of scores that fall into each class interval. In the above example, the
entries in the frequency column are derived from the tallying of scores. You could verify this by comparing the
entries with the above tally and summing up the frequency of scores occurring within a given range of intervals.
Once data is organized in a frequency distribution, some general observations can be pointed out such as - that
the majority of the 50 wealthy people is over 55 years old. Statistical measures such as the measures of central
tendencies, measures of variabilities, and measures of location can be done on the data when organized in a
frequency distribution table.
Finally, a frequency distribution could be a grouped frequency distribution, just as shown in the above example,
where the class intervals are number ranges, or it can be a categorical frequency distribution where the groups
are qualitative categories with no implied rank or order. Below is a set of blood types of 25 patients in a hospital.

Below is the categorical frequency distribution of the above data on blood types.
Steps in Constructing a Frequency Distribution Table
Note that each step below is accompanied by how each is applied to the example of the ages of the 50 wealthiest
people in the world according to Forbes Magazine.

1. Organize the data into an array (optional). The purpose of arranging the data into an array is for you to
identify the lowest score (LS) and the highest score (HS) of the data set.
• LS = 37 and HS = 90
2. Compute for the range (R) of scores. The range is computed as R = HS - LS.
• R = 90 - 37 = 53
3. Compute for the suggested number of class intervals 𝑘. using the formula 𝑘 = 1 + 3.3 𝑙𝑜𝑔 𝑁, where 𝑁
is the number of items in the data set and 𝑙𝑜𝑔 𝑁 is the logarithm of 𝑁. Round off the result to the nearest
whole number. Take note that this result is but a suggested value and can be adjusted to fit the range of
scores in the data set.
• 𝑘 = 1 + 3.3 𝑙𝑜𝑔 50 = 6.606601014 ≈ 7
4. Compute for the class size (𝑖) using 𝑖 = 𝑅/𝑘. Round off the result to the nearest whole number. As with
the computed class interval, the value for the class size is but a suggestion, serving as a guide as to how
much thickness can be used to construct the class intervals of the frequency distribution. Preferably the
class size should be an odd number so as to avoid computing for midpoints and other auxiliary values
with decimal parts.
• 𝑖 = 53/7 = 7.571428571 ≈ 8
5. Start the construction of the frequency distribution table with the first and lowest among the class
intervals. Note that choosing the starting number for the lower-class limit (LCL) of the first class interval
is arbitrary. The only parameter for choosing the first LCL is that it should contain the lowest score of the
data set. There are several suggestions as to what number does the first class interval should begin with.
Consider each of these suggestions below:
• Start with the lowest score of the data set. For the foregoing example, this would be LS = 37 and
the class interval 37 - 44 (note that the thickness of the interval is 8, counting from 37, 38, 39, 40,
41, 42, 43, and 44 because the class size i = 8); or
• Start with the number immediately lower than the lowest score of the data set that is a multiple
of the class size. For the foregoing example, this would be 32 which is a multiple of 8, and the
class interval 32 - 39 (this interval is not necessarily better or inferior to the interval 37 - 44); or
• Start with the number immediately lower than the lowest score of the data set which is a multiple
of 5 or a multiple of 10. For the foregoing example, these would be 35 with the class interval 35 -
42 and 30 with the class interval 30 - 37, respectively.
6. Continue constructing the class intervals until eventually, you have covered the highest score of the data
set. Indicate the frequency of each of the class intervals based on the tallying of scores in the data set.
• If all the suggestions above are applied to the foregoing example and using the lowest score as
the starting number for the first class interval, then the resulting frequency distribution will be like
the one shown below.

Note that in the frequency distribution shown above, we can remove the last class interval since no scores are
categorized in this last range. The modified version of the above frequency distribution is shown below.
Note that the above frequency distribution table is not the one presented in the main discussion of the material.
Below is the frequency distribution used in the main discussion posted for easy comparison.

The obvious difference between the two frequency distributions is that one has fewer class intervals than the
other. This is due to one using a larger class size and thus resulting in a more congested frequency distribution.
In general, the larger the class size the more compact the frequency distribution having fewer class intervals,
and the smaller the class size the more spread out the frequency distribution having more class intervals. Which
of these schemes is better? Neither, it all boils down to use and which scheme will better convey the information
the data represent when presented to the audience. But as mentioned above, a frequency distribution with the
odd-numbered class size is preferable computation wise.
Test your understanding

Raw Data N = 160

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 101 91 106 116 107 111 99 110 112 105 129 108 131 121 93 105 124 110 91 89 99 107 120 105 95 106 99 89 84 84
2 94 105 96 80 97 87 86 94 106 105 75 101 102 100 89 99 93 84 89 95 82 107 100 111 119 100 105 117 98 123
3 109 85 98 102 104 96 110 90 106 111 103 102 101 116 116 101 112 110 104 118 97 107 107 117 99 95 107 110 106 112
4 92 110 102 101 108 99 94 108 87 100 105 86 106 83 105 103 81 107 81 96 94 90 89 82 88 87 102 91 108 109
5 103 89 90 109 80 106 107 80 105 106 94 82 101 97 83 82 99 102 100 101 112 98 106 102 95 96 96 77 101 94
6 91 113 92 92 109 85 107 105 107 94

1. Construct a frequency distribution table for the given raw data following the suggested steps
presented in the module.
A. What is the highest score of the distribution? the lowest score?
B. What is the range of the distribution?
C. Compute for the suggested number of class intervals (round to the nearest whole number).
D. Compute for the suggested class size (round to the nearest whole number).
E. Start the first class interval with the given lowest score.
F. Tally the scores and indicate the frequency of each class interval.

2. Construct another frequency distribution table for the given raw data using a class size of 5 and
starting the class interval with the lowest score of the data set.

You might also like