Data Management
Data Management
Data Management
Learning Outcomes
Key Concepts
Chart Title
6
0
Category 1 Category 2 Category 3 Category 4
The data gathered should be properly organized in to grouped data called frequency distribution. How to
construct frequency distribution? Consider the following steps:
11 19 11 15 16 10
16 16 15 17 10 27
21 11 13 21 10 16
11 19 24 12 22 13
11 19 24 12 22 13
19 13 18 20 21 11
19 13 18 25 29 11
16 23 10 17 11 27
16 24 12 21 13 12
26 15 11 14 10 12
11 15 18 12 20 13
Solution:
k = 1+ 5.3344536
Exercises: Construct a frequency polygon for the following data. The scores of students in a Geometry
Test.
55 63 44 37 50 57 44 57 42 46
58 40 54 65 39 27 28 56 38 45
30 35 56 78 55 27 50 28 44 28
39 37 65 43 33 70 60 61 60 44
Interpretation of Data
Any given data in statistics are useless if we don’t interpret them. The most appropriate measures
found to be useful in describing a distribution of observations are the measures of central tendency,
measures of variation, measures of relative position, z – scores, box and whisker plot, probability and
normal curve, linear regression and correlation.
Central Tendency determines a numerical value in the central region of a distribution of scores.
Central tendency refers to the center of a distribution of observations. There are three measures of central
tendency: the mean, the median and the mode. These are used when the general or over-all performance
of the class is compared to other classes.
1. Mean
The mean, Mn, is also called the arithmetic mean or average. It can be affected by extreme scores.
It is used if the most reliable measure is desired and when there are a few with very high values
and a few with very low values. The mean is the balance point of score distribution.
A. Ungrouped Data:
The mean is the balance point of a distbution.
Example 1: Jeffrey has been working on programming and updating a Web site for his company for the
past 24 months, the following numbers represent the number of hours Jeffrey has worked on
his Web site for each of the past 7 months: 24, 25, 31, 50, 53, 66, 78. What is the mean
(average) number of hours that Jeffrey worked on this Web site each month?
Solution:
Step 1: Add the numbers to determine the total number of hours he worked.
24 + 25 + 33 + 50 + 53 + 66 + 78 = 329
Mean = 329 = 47 was the average number of hours that Jeffrey worked on this Website each month.
Example 2: The following are Marivic’s scores in Statistics quizzes during the 70, 72, 77, 78, 86, and 79.
Solution:
a. mean = 70+72+77+78+86+84+79 = 78
7
b. To answer b, we subtract the mean from each score from each score and sum up
the differences.
70 – 78 = -8
72 – 78 = -6
77 – 78 = -1
86 – 78 = 8
78 – 78 = 0
84 – 78 = 6
79 – 78 = 1
0
c. Change the lowest and the highest scores. Let 20 be the lowest score and 100 be
the highest. Therefore:
Example
There are 1,000 notebooks sold at Php10 each; 500 notebooks at Php20 each; 500
notebooks at Php25 each, and 100 notebooks at Php30 each. Compute the
weighted mean.
Solution:
20 500 10,000
25 500 12,500
30 100 3,000
N 2,100
B. Grouped Data
There are two ways on how to solve for the value of mean given the grouped data or
frequency distribution.
a. Mn = ΣfXmean
N
Where Mn = mean
f = frequency
Xm = (MEAN OF X) class mark
Xc = assumed mean to be determined any among the Xm values
X0 = mean of Xm
ΣfXm = sum of the product of frequencies and class marks
N = total frequency
I = interval/ frequency distribution
Example:
The table below summarizes the weights of the Cubs. Find the average weight of the
cubs.
Reminder: The class mark is just equal to the average value of the upper class limit and
Solution: the lower class limit from each of the class limits in the given frequency
In solving for the mean given the grouped data or frequency distribution, we have to
distribution.
add columns for classmark (Xm) and fXm2 that is
Therefore: Mn = X0 + (ΣfXc)i
N
= 185.5 + (-21x10)/45
= 185.5 + -4.6667
= 180.8333
Exercises:
1. The sizes of parts sold during one business day in a department store are 32, 38, 34, 42, 36, 34,
40, 44, 32, 34. Find the average size of the pants sold.
2. Given the frequency distribution for the weights of 50 pieces of luggage.
2. MEDIAN
The median, Md, is the value in the distribution that divides an arranged (ascending/descending) set
into two equal parts. It is the midpoint or middlemost of a distribution of scores. Fifty percent of the scores
fall above it and 50% fall below it. It is also known as the 50th percentile. It is not affected by extreme
scores. This is used when the distribution of scores is skewed. The median separates the distribution into
two equal parts.
A. Ungrouped Data
The median is obtained by inspecting the middlemost value of the arranged distribution either in
ascending or descending order. It can also be solved using the formula (N+1)/2th position after being
arranged.
Examples:
Solution:
Php 12, Php 35, Php 48, Php 50, Php 55, Php 60, Php 65, N = 7
Therefore:
Md = 50
N=6
Md = (N + 1)/2th score
Md = (6 + 1)/2 = 3.5th score, that is between the 3rd and the 4th scores.
B. Grouped Data
fm
where Md = median
XLB = the lower boundary or true lower limit of the median class
N = total frequency
Solution:
= 30th score
The median class that contains the 30th score is 14-15 since it has a 30th score.
XLB = 13.5
cfb = 24
fm = 6
i =2
fm
Md = 13.5 + [(6)2]/6 = 2
Md = 13.5 + 2
Md = 15.5
This means that 50 percent of the students got a score below 15.5 or if the passing score is 50 percent of
the total number of items, almost half of the class failed in the test.
Exercises:
1. The ages of 10 Administrators in a certain college are given as follows: Compute the median.
= 30th score
cfb = 25
fm = 12
i =5
XLB = 54.5
fm
12
Md = 54.5 + 2.083333
Md = 56.58
3. Mode
The mode is the value with the largest frequency. It is the value that occurs most frequently in
the distribution. This is used when the quickest estimate of typical performance is wanted. A
distribution can be unimodal with one mode value, bimodal with two mode values and trimodal
with three mode values. In other words, it can have more than one mode.
Solution:
Class Frequency f
28-29 1
26- 27 3
24-25 3
22-23 3
20-21 6
18-19 6
16-17 8
14-15 6
12-13 10
10-11 14 = modal class
N= 60
Solution:
M0 = XLB + [df1/(df2+df2)]i
XLB = 9.5
= 14
df2 = 14 – 10
= 4
i = 2
M0 = XLB + [df1/(df2+df2)]i
M0 = 9.5 + [14/(14+4)]2
= 9.5 + 1.56
= 11.06
EXERCISES
Scores in Algeba f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60
M0 = XLB + [df1/(df2+df2)]i
XLB = 54.5
df1 = 12 – 7 = 5
df2 = 12 – 8 = 4
i = 5
M0 = XLB + [df1/(df1+df2)]i
M0 = 54.5 + [5/(5+4)]5
M0 = 54.5 + 2.77777778
M0 = 57.78
As median divides the set of scores into two equal parts, there are other measures
that divide distribution into one hundred, four, or ten equal parts. These are the other measure of
position: the percentiles, the quartiles, and the deciles.
A. The Percentiles
One way of assessing performance is by the use of percent. The percentiles are
the score-points that divide a distribution into 100 equal parts. For example, the
10th percentile (P10) separates the lowest 10% from the other 90%; the 25th
percentile (P25) separates the lowest 25% from the other 75% while the 80%
percentile (P80) separates the lowest 80% from the other 20%.
Consider this situation, if Juan got a score of 60 and ranked ninth (9th) in
a class of 150 students. It means that 150 – 9 = 141 students below his rank. If we
get the percentage, 141/150 = 0.94 = 94%. This means that 94% of the class ranked
below or got scores below Juan. Then we can say that the percentile rank of Juan
in the class is 94 which also implies that 94 out of 100 students got scores below
his score. And 5% of the class obtained scores higher than Juan.
The percentile rank tells how many percent of the cases got below the
rank or position. The score of Juan is 60, so we can say that 94th percentile pointis
60. The percentile point is the score or value that corresponds to the given
percentile rank. It is denoted by the symbol, Pn where n is the percentile rank.
Thus in the example, P94 = 60.
a. Ungrouped Data
Examples:
1. Mrs. Corpuz conducted a quiz to ten students. The scores obtained are as follows:
5, 8, 7, 6, 3, 6, 10, 5, 6, 4
a. What scores corresponds to the 100th percentile?
b. What is the 50th percentile point?
Solution:
Solution:
a. The P65 implies that Jason got a score higher than 65 percent of the class.
b. Since there are 50 students in all, the number of students who got scores below
Jason is 50(60%) = 50(0.65) = 32.5
3. John has a height corresponding to a percentile rank of 80% of the group or 20(0.80) = 16
boys who are taller than John is 20 - 14 - 1 = 6 -1 =5.
b. Grouped Data
To compute for the Percentile of given grouped data, the formula is to be used.
percentile interval
i = the class size
f=6 F = 38 i =2 N = 60
Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-54 12
50-54 7
45-49 10
40-44 8
N 60
B. The Quartiles
The quartiles are points that divide a distribution into four equal parts.
Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60
C.The Deciles
The deciles are points that divide a distribution into ten equal parts.
Example:
60 – 62 2 40
57 – 59 2 38
54 – 56 4 36
51 – 53 5 32
48 – 50 11 27
45 – 47 8 16
42 – 44 4 8
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1
N = 60
Find Q1 P10 D2
Solution:
a. Q1 = XLB + [i(N/4 – F)/f]
60 – 62 2 40
57 – 59 2 38
54 – 56 4 36
51 – 53 5 32 0.25 x 40 =
48 – 50 11 27 10 is found in
45 – 47 8 16= percentile interval
42 – 44 4 8
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1
N = 40
Solution:
= 44.5 + 0.75
N = 40
Solution:
f=2
60 – 62 2 40
57 – 59 2 38
54 – 56 4 36 0.20 x 40 = 8
51 – 53 5 32 found in
48 – 50 11 27
45 – 47 8 16
42 – 44 4 8 = percentile rank
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1
N = 60
Solution:
n = 20%
0.20 x 40 = 8
XLB = 41.5
f=4
F=4 i = 3 N = 40
= 41.5 + 3 = 44.5
Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60
Measure of Variation
The degree of variation measures the degree of the spread of the values. The measures of spread are
commonly called the measures of dispersion or measures of variation. There are six measures of
variation; the range, the quartile deviation, the interquartile range, the mean deviation, the variance
and the standard deviation. This is also used to determine how varied, dispersed, scattered, or distant
or how close, clustered or near performances of the members of the group are. It also describes the
heterogeneity and homogeneity of the group.
A. Range
The range is the difference between the highest score (h.s) and the lowest score (l.s). It gives
us the quickest estimate. It shows the two extreme scores of a set of data. For grouped data,
the range can be calculated by subtracting the lower boundary (l.b) of the lowest class
interval from the upper boundary (u.b) of the highest class interval.
Examples.
1. Find the range of the following data:
a. 10, 12, 12, 14 : R = 14 – 10 = 4
b. 80, 100, 100, 120 : R = 120 – 80 = 40
c. 45, 50, 50, 55 : R = 55 – 45
Recall: Range upper class boundary of the topmost class limit – lower class boundary of the bottom class
limit.
Solution:
= 29.5 – 9.5
= 20
EXERCISES
Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30
EXERCISES
Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30
Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30
MD = Σ| X – Mn |
N
Where:
X = the score in the distribution
Mn = the mean
N = is the number of observation
Example. Find the mean deviation of the following ungrouped distribution: 4, 8, 12.
Solution:
a. Calculate the mean. Mn = 24/3 = 8
X |X – Mn |
4 4
8 0
12 4
Σ| X – Mn| = 8
MD = 8/3 = 2.67
b. Grouped Data
For group frequency distribution, the formula is,
Solution:
Calculate the mean by using the formula, Mn = ΣfXm midpoint method, we add columns for Xm and fXm
X f Xm fXm
30-34 4 32 128
25-29 5 27 135
20-24 6 22 132
15-19 2 17 34
10-14 3 12 36
Mn = 465/20 = 23.25
X f Xm fXm Xn – Mn f Xn – Mn
N =20 Σf Xn – Mn = 107.50
MD = Σf Xn – Mn = 07.50 = 5.375
N 20
EXERCISE
Find the mean deviation.
Ungrouped data
Example: Find the variance and the standard deviation of the following distribution.
Mn = (4+8+12)/3 X (x – Mn)2
4 16
8 0
12 16
Σ(X-Mn)2 = 32
S = √Σ(X-Mn)2]/N-1 = √[32]/3-1 = 4
S = 42 = 16
B. Grouped Data
For the data organized in a frequency distribution, the standard deviation is
computed this way:
S = √[Σf(X-Mn)2]/N-1
N=40 2113.00
S2 = 7.362 = 54.1696
z-scores
z-scores (also knows as standard score) measures how many standard deviations an
observation is above or below the mean. A positive z-score measures the number of standard
deviations a score is above the mean, and a negative z score means the number of standard
deviations a score is below the mean. Z-score can be computed using the formula.
𝑥− µ
𝑧=
𝜎
Example 1: John got 76 marks in his Statistics test. If the marks of the whole class had a mean
of 52 and a standard deviation of 8, what was John’s standard score?
Solution:
76−52
Z= =3
8
Example 2: Given the mean of 55 and standard deviation of 8, what score corresponds to two
standard deviations above the mean?
Solution:
𝑥− 𝜇
z =
𝜎
𝑥−55
2=
8
16 = x – 55
x = 16 + 55 = 71
Example 3: Given the following data, in which subject did Roel perform poor?
𝑥− 𝜇 90−85
𝑍𝑀𝐴𝑇𝐻 = = = 3.33
𝜎 1.5
95 − 97 2
𝑍𝐸𝑁𝐺𝐿𝐼𝑆𝐻 = = − = −1
2.0 2
94−92 2
𝑍𝑀𝐴𝑃𝐸𝐻 = = 1.75 = 1.14
1.75
Therefore, Roel performed poor in English because English has the smallest value of the z-
score.
EXERCISES
X µ 𝜎 𝑧
1 9.5 5 3
2 -10 9 0.1
3 32.1 7 2.3
4 14 4.5 -0.7
5 -19 -7 -2.4
2. The standard score of Mary in Chines Test is 1.2 and her standard score in the
English Test is 0.8. In what subject did she perform better?
3. Kay and Ed’s result in a Grammar and a speech tests among 100 pupils are in the
table below:
Grammar Speech
Kay 82 65
Ed 77 70
Mean 65 70
Sd 15 10
WHISKER WHISKER
LOWEST HIGHEST
QUARTILE Q1 QUARTILE Q2
Example 1: Draw the box-and-whisker plot for the following data set:
77, 79, 80, 86, 87, 87, 94, 99
Solution:
Find the minimum value, Q1, Q2, Q3 and the maximum value.
min: 77 max: 99
80 90 100
Example 2: Draw a box-and-whisker plot for the following data set and find the outliers.
4.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3, 4.8, 4.4, 4.2, 4.5, 4.4
Solution:
Arrange the values in order to find the median.
3.9, 4.1, 4.2, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
Median = Q2 = 4.4
Subset 1: 3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4 median=Q1 = (4.3+4.3)/2=4.3
Subset 2: 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1 median=Q3 = (4.7+4.8)/2=4.75
Therefore: min: 3.9, Q1 = 4.3,