Data Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

DATA MANAGEMENT

Learning Outcomes

At the end of this Chapter, you must be able to:

1. Use variety of statistical tools to process and manage numerical data;


2. Use the methods of linear regression and correlations to predict the value of a variable given
certain conditions; and
3. Advocate the use of statistical data in making important decisions.

Key Concepts

Gathering and Organizing of Data

The data (Asaad, 2004) are the quantities (numbers) or


qualities (attributes) measured or observed that are to be collected
and/or analyzed. A collection of data is called data set. Two
categories of data are categorical and continuous data. Categorical
data are nominal and ordinal scales while continuous data are ratio
and interval scales. The nominal scale consists of a finite set of
possible values having no particular order. Some examples include
gender, mode of transportation, nationality, occupation, and civil
status. An ordinal scale is a set of possible values having a specific
order. Some examples are pain level, social status, and attitude
toward the subject. On the other hand, the continuous scale has
interval and ratio scales. Interval scales are measured in continuum
and differences between any two numbers on the scale are of known
size. Some examples are temperature, tons of garbage, number of
arrests, income, and age. There is a need to distinguish them in order
to decide what method to use because it varies according to the type
of data. Categorical data use non-parametric statistics while
continuous data use parametric statistics.
A variable refers to a property that can take on different
values or categories which cannot be predicted with certainty.
The three common types of variables are independent variables
or X which are also called explanatory variables, these may be
continuous, nominal or ordinal; dependent variables, or Y
variables which are also called the response variables; and
control variables, the Z variables. Variables can also be classified
as qualitative variable and quantitative variable. Quantitative
variable is one that can be measured and ordered according to
quantity while qualitative variable is one simply used as labels
to distinguish one group from another. Discrete variable
includes finite or countably finite while continuous variable
covers the values in an interval of real number line.
The data gathered shall be presented, analyzed and
interpreted that can be easily understood by the reader. Data
may be presented in textual, tabular, graphical or a combination
of these. Textual presentation uses statements with numerals in
order to describe the data for concrete information and in
expository form. It is to discuss the data and the information and
interpretation it carries. Tabular presentation uses statistical
table to directly display the quantities or values collected as
data. Graphical presentation illustrates data in a form of graphs
aiding readers to understand the text easily. A graph is the most
attractive, effective and convincing way. There are various types
of graphs we can prepare like a bar graph, circle graph, line graph
or pictograph.
Sales

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Circle Graph Bar Graph

Chart Title
6

0
Category 1 Category 2 Category 3 Category 4

Series 1 Series 2 Series 3

Line Graph Pictograph

The data gathered should be properly organized in to grouped data called frequency distribution. How to
construct frequency distribution? Consider the following steps:

1. Determine as to estimate number of classes k, k = 1 + 3 log n, where n is the number of population


2. Determine the range, r = highest value – lowest value
3. Obtain the class size, c = range/k
4. Set the lowest value as the first lower limit and get the upper limit which is equal to the first lower
limit + class size – 1.
5. Do the same process again until you reach the last class limit that includes the highest value from
the data.
Example 1. Construct a frequency polygon for the following data:

11 19 11 15 16 10
16 16 15 17 10 27
21 11 13 21 10 16
11 19 24 12 22 13
11 19 24 12 22 13
19 13 18 20 21 11
19 13 18 25 29 11
16 23 10 17 11 27
16 24 12 21 13 12
26 15 11 14 10 12
11 15 18 12 20 13

Solution:

1. Determine the value of k = 1 + 3 log n where n = 60. log 60 = 1.77815125, k = 1 + 3 (1.7781512)

Class limits Frequency


28-30 1
25-27 4
22-24 5
19-21 10
16-18 10
13-15 11
10-12 19
I=3 n = 60

k = 1+ 5.3344536

k = 6.3344 Therefore, 6 is the estimate number of classes in these data.

2. r = 29 - 10 = 19 3. Class size = 19/6.3344 = 2.9994

Exercises: Construct a frequency polygon for the following data. The scores of students in a Geometry
Test.

55 63 44 37 50 57 44 57 42 46
58 40 54 65 39 27 28 56 38 45
30 35 56 78 55 27 50 28 44 28
39 37 65 43 33 70 60 61 60 44
Interpretation of Data

Any given data in statistics are useless if we don’t interpret them. The most appropriate measures
found to be useful in describing a distribution of observations are the measures of central tendency,
measures of variation, measures of relative position, z – scores, box and whisker plot, probability and
normal curve, linear regression and correlation.

Measures of Central Tendency

Central Tendency determines a numerical value in the central region of a distribution of scores.
Central tendency refers to the center of a distribution of observations. There are three measures of central
tendency: the mean, the median and the mode. These are used when the general or over-all performance
of the class is compared to other classes.

1. Mean
The mean, Mn, is also called the arithmetic mean or average. It can be affected by extreme scores.
It is used if the most reliable measure is desired and when there are a few with very high values
and a few with very low values. The mean is the balance point of score distribution.

How to compute the mean?

A. Ungrouped Data:
The mean is the balance point of a distbution.

Sum of the values


a. Mean, Mn =
The number of values

Example 1: Jeffrey has been working on programming and updating a Web site for his company for the
past 24 months, the following numbers represent the number of hours Jeffrey has worked on
his Web site for each of the past 7 months: 24, 25, 31, 50, 53, 66, 78. What is the mean
(average) number of hours that Jeffrey worked on this Web site each month?

Solution:

Step 1: Add the numbers to determine the total number of hours he worked.

24 + 25 + 33 + 50 + 53 + 66 + 78 = 329

Step 2: Divide the total by the number of months.

Mean = 329 = 47 was the average number of hours that Jeffrey worked on this Website each month.

Example 2: The following are Marivic’s scores in Statistics quizzes during the 70, 72, 77, 78, 86, and 79.

a. Compute for the mean of the scores.


b. Show that the sum of the differences of the scores from the mean is 0.
c. Show that it is greatly affected by extreme values.

Solution:

a. mean = 70+72+77+78+86+84+79 = 78

7
b. To answer b, we subtract the mean from each score from each score and sum up
the differences.
70 – 78 = -8
72 – 78 = -6
77 – 78 = -1
86 – 78 = 8
78 – 78 = 0
84 – 78 = 6
79 – 78 = 1
0
c. Change the lowest and the highest scores. Let 20 be the lowest score and 100 be
the highest. Therefore:

Mean = 20+72+77+86+78+84+100 = 73.86

b. Weighted Mean WMn = ΣfX


N
where: WMn = weighted mean
f = frequency
X = score
ΣfX = sum of the product of frequency and score
N = total frequency

Example

There are 1,000 notebooks sold at Php10 each; 500 notebooks at Php20 each; 500
notebooks at Php25 each, and 100 notebooks at Php30 each. Compute the
weighted mean.
Solution:

Prepare the frequency distribution.

Notebook’s Price (X) f fXnotebook

Php 10 1000 Php 10,000

20 500 10,000

25 500 12,500
30 100 3,000

N= 2,100 ΣfX = Php 35,500

Therefore: WMn = ΣfX = 35, 500 = 16.90

N 2,100

B. Grouped Data
There are two ways on how to solve for the value of mean given the grouped data or
frequency distribution.

a. Mn = ΣfXmean
N
Where Mn = mean
f = frequency
Xm = (MEAN OF X) class mark
Xc = assumed mean to be determined any among the Xm values
X0 = mean of Xm
ΣfXm = sum of the product of frequencies and class marks
N = total frequency
I = interval/ frequency distribution

Example:
The table below summarizes the weights of the Cubs. Find the average weight of the
cubs.

Weights of the Cubs


Weights of the Cubs (X) F Xm fX
201-210 3 205.5
191-200 8 195.5
181-190 12 185.5
171-180 11 175.5
161-170 9 165.5
151-160 2 155.5
I = 10 N=

Reminder: The class mark is just equal to the average value of the upper class limit and
Solution: the lower class limit from each of the class limits in the given frequency
In solving for the mean given the grouped data or frequency distribution, we have to
distribution.
add columns for classmark (Xm) and fXm2 that is

Weights of the Cubs f Xm Xcubs fXcubs


201-210 3 205.5 2 6
191-200 8 195.5 1 8
181-190 12 185.5 0 0
171-180 11 175.5 -1 -11
161-170 9 165.5 -2 -18
151-160 2 155.5 -3 -6
I = 10 N= 45 ΣfXc = -21

Therefore: Mn = X0 + (ΣfXc)i
N

= 185.5 + (-21x10)/45

= 185.5 + -4.6667

= 180.8333

Exercises:

1. The sizes of parts sold during one business day in a department store are 32, 38, 34, 42, 36, 34,
40, 44, 32, 34. Find the average size of the pants sold.
2. Given the frequency distribution for the weights of 50 pieces of luggage.

Weight (kilograms) Number of Pieces, f


7-9 2
10-12 8
13-15 14
16-18 19
19-21 7
N 50

2. MEDIAN

The median, Md, is the value in the distribution that divides an arranged (ascending/descending) set
into two equal parts. It is the midpoint or middlemost of a distribution of scores. Fifty percent of the scores
fall above it and 50% fall below it. It is also known as the 50th percentile. It is not affected by extreme
scores. This is used when the distribution of scores is skewed. The median separates the distribution into
two equal parts.

How to compute the median?

A. Ungrouped Data

The median is obtained by inspecting the middlemost value of the arranged distribution either in
ascending or descending order. It can also be solved using the formula (N+1)/2th position after being
arranged.

Examples:

1. Find the median of the following prices:


Php 50, Php 55, Php 60, Php 65, Php 12, Php 12, Php 35, Php 48,

Solution:

Php 12, Php 35, Php 48, Php 50, Php 55, Php 60, Php 65, N = 7

Therefore:

Md = (N + 1)/2 = 4th score

Md = 50

2. Find the median of the following weights in kilos.

101, 107, 115, 120, 111, 105

Solution: Arranging the numbers in ascending order.

101, 105, 107, 111, 115, 120

N=6

Md = (N + 1)/2th score

Md = (6 + 1)/2 = 3.5th score, that is between the 3rd and the 4th scores.

Md = (107 + 111)/2 = 109

B. Grouped Data

In computing the median of the grouped data, determine the


median class which contains the (N/2)th score under <cf of the
cumulative frequency distribution. To solve for the median, we use
the formula:
Md = XLB + (N/2 – cfb)i

fm

where Md = median

XLB = the lower boundary or true lower limit of the median class

N = total frequency

cfb = cumulative frequency before the median class

fm = frequency of the median class


I = size of the class interval

Example: Solve for the median for the following data.

Statistics Test Results

Class frequency F <cf


28-29 1 60
26-27 3 59
24-25 3 56
22-23 3 53
20-21 6 50
18-19 6 44
16-17 8 38
14-15 6 30 = median class
12-13 10 24 = cfb
10-11 14 14
N = 60

Solution:

N/2th score = (60/2)th score

= 30th score

The median class that contains the 30th score is 14-15 since it has a 30th score.

XLB = 13.5

cfb = 24

fm = 6

i =2

Md = XLB + (N/2 – cfb)i

fm

Md = 13.5 + (60/2 – 24)2

Md = 13.5 + [(6)2]/6 = 2

Md = 13.5 + 2

Md = 15.5

This means that 50 percent of the students got a score below 15.5 or if the passing score is 50 percent of
the total number of items, almost half of the class failed in the test.

Exercises:
1. The ages of 10 Administrators in a certain college are given as follows: Compute the median.

40, 38, 45, 51, 44, 53, 59, 45, 56, 45

2. Compute the median given the following data:

Scores in Statistics (X) f cf


75-79 6 60
70-74 7 54
65-69 2 47
60-64 8 45
55-59 12 37 = median class
50-54 7 25
45-49 10 18
40-44 8 8
N = 60 60
Solution:

N/2th score = (60/2)th score

= 30th score

cfb = 25
fm = 12
i =5
XLB = 54.5

Md = XLB + (N/2 – cfb)i

fm

Md = 54.5 + (60/2 – 25) 5

12

Md = 54.5 + 2.083333

Md = 56.58

3. Mode

The mode is the value with the largest frequency. It is the value that occurs most frequently in
the distribution. This is used when the quickest estimate of typical performance is wanted. A
distribution can be unimodal with one mode value, bimodal with two mode values and trimodal
with three mode values. In other words, it can have more than one mode.

How to find the mode?


A. The mode of Ungrouped Data
The mode of ungrouped data is found by merely inspection.
Example.
1. Find the mode of the following discounts.
4%, 7%, 7%, 7%, 8%, 8%, 9%, 10%, 11%, 11%, 13%

Solution:

By inspection, the mode is 7 since it has the largest frequency.

B. The Mode of Grouped Data


To find the mode of the grouped data, determine first the modal class. The modal
class is the class with the highest frequency, and we will use the formula.

Moden M0 = XLB + {df1/(df1 + df2)}i


Where:
M0 = Mode
XLB = lower boundary of the modal class
df1 = difference between the frequency of the modal class and the
frequency below it.
df 2 = difference between the frequency of the modal class and the
frequency above it.
i = size of the class interval

Example: Find the mode of the following data:

Statistic Test Result

Class Frequency f
28-29 1
26- 27 3
24-25 3
22-23 3
20-21 6
18-19 6
16-17 8
14-15 6
12-13 10
10-11 14 = modal class
N= 60
Solution:

M0 = XLB + [df1/(df2+df2)]i

XLB = 9.5

df1 = 14 – 0, because there is no frequency below the modal class

= 14

df2 = 14 – 10

= 4

i = 2

M0 = XLB + [df1/(df2+df2)]i

M0 = 9.5 + [14/(14+4)]2

= 9.5 + 1.56

= 11.06

EXERCISES

1. Find the mode in the following data:


1 5 6 9 11 15 17
2 5 7 9 12 15 17
3 5 7 9 12 15 18
4 6 8 12 10 16 18
4 6 9 12 11 16 18

2. Solve for the mode, given the frequency distribution:

Scores in Algeba f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60

M0 = XLB + [df1/(df2+df2)]i

XLB = 54.5
df1 = 12 – 7 = 5

df2 = 12 – 8 = 4

i = 5

M0 = XLB + [df1/(df1+df2)]i

M0 = 54.5 + [5/(5+4)]5

M0 = 54.5 + 2.77777778

M0 = 57.78

Measure of Relative Position

As median divides the set of scores into two equal parts, there are other measures
that divide distribution into one hundred, four, or ten equal parts. These are the other measure of
position: the percentiles, the quartiles, and the deciles.

How to compute for the percentiles, quartiles, and deciles?

A. The Percentiles
One way of assessing performance is by the use of percent. The percentiles are
the score-points that divide a distribution into 100 equal parts. For example, the
10th percentile (P10) separates the lowest 10% from the other 90%; the 25th
percentile (P25) separates the lowest 25% from the other 75% while the 80%
percentile (P80) separates the lowest 80% from the other 20%.
Consider this situation, if Juan got a score of 60 and ranked ninth (9th) in
a class of 150 students. It means that 150 – 9 = 141 students below his rank. If we
get the percentage, 141/150 = 0.94 = 94%. This means that 94% of the class ranked
below or got scores below Juan. Then we can say that the percentile rank of Juan
in the class is 94 which also implies that 94 out of 100 students got scores below
his score. And 5% of the class obtained scores higher than Juan.
The percentile rank tells how many percent of the cases got below the
rank or position. The score of Juan is 60, so we can say that 94th percentile pointis
60. The percentile point is the score or value that corresponds to the given
percentile rank. It is denoted by the symbol, Pn where n is the percentile rank.
Thus in the example, P94 = 60.
a. Ungrouped Data
Examples:
1. Mrs. Corpuz conducted a quiz to ten students. The scores obtained are as follows:
5, 8, 7, 6, 3, 6, 10, 5, 6, 4
a. What scores corresponds to the 100th percentile?
b. What is the 50th percentile point?

Solution:

a. Arrange the scores in descending order.


10, 8, 7, 6, 6, 6, 5, 5, 4, 3
The highest is 10, the middles is 6, and the lower is 3.
The one who scored 10 surpassed all the others. But the class intervals will
always have the upper boundary, so the 100th percentile point is the upper
boundary of the highest score. P100 = 10.5
b. Since the middle score is 6, it is surpasses half (50%) of the students. Therefore,
P50 = 6.
2. In a class of 50, Jason got a percentile rank of 65.
a. What does this percentile rank imply?
b. How many students rank below Jason?

Solution:

a. The P65 implies that Jason got a score higher than 65 percent of the class.
b. Since there are 50 students in all, the number of students who got scores below
Jason is 50(60%) = 50(0.65) = 32.5
3. John has a height corresponding to a percentile rank of 80% of the group or 20(0.80) = 16
boys who are taller than John is 20 - 14 - 1 = 6 -1 =5.

b. Grouped Data

To compute for the Percentile of given grouped data, the formula is to be used.

Pn = XLB + [i (nN - F)/ f]

where: Pn = the score corresponding to the ith percentile rank

XLB = the lower limit of the percentile class interval

f = the frequency of the percentile interval

F = the cumulative frequency of the interval before the

percentile interval
i = the class size

n = the rank in decimals

N = the total frequency

Example : Find the P72

Statistics Test Results

Class frequency F <cf


60
28-29 1
59
26-27 3
56
24-25 3
53 0.72 x 60 =43.20 is found
22-23 3
50 in
20-21 6
44 = percentile interval
18-19 6
38
16-17 8
30
14-15 6
24
12-13 10
14
10-11 14
N = 60
Solution:

Pn = P72 n = 72% = 0.72 XLB = 17.5

f=6 F = 38 i =2 N = 60

P72 = 17.5 + 2 [(0.72) (60) – 38/6]

= 17.5 + 14.3 + 31.83

EXERCISE Solve for the P30 P70 and P65

Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-54 12
50-54 7
45-49 10
40-44 8
N 60

B. The Quartiles

The quartiles are points that divide a distribution into four equal parts.

Consider that Qi = P25 Q2 = P30 Q3 = P75 Q4 = P100The lower quartile is Qi

And the upper quartile is Q3.

To compute for the quartiles the following formula is used:

Qn = XLB + [i(N/4 - F)/f]

Where: Qn = the score corresponding to the ith quartile rank

XLB = the lower limit of the quartile class interval

f = the frequency of the quartile interval

F = the cumulative frequency of the interval befor the quartile interval

i = the class size

4 = stands for the quartile division

N = the total frequency

Exercise Compute Qi and Q3

Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60
C.The Deciles

The deciles are points that divide a distribution into ten equal parts.

Each part is called a decile. So, D1 = P10 D2 … D10 = P100.

To compute for the quartiles, the following formula is used:

Dn = XLB + [i(N/10 - Fi)/f]

Where : Dn = the scores corresponding to the ith decile rank

XLB = the lower limit of the decile class interval

f = the frequency of the decile interval

F = the cumulative frequency of the interval before the decile interval

i = the class size

10 = stands for the decile division

N = the total frequency

Example:

1. Given the frequency dirstribution below, calculate the following:

Statistics Test Result


Class Frequency F <cf

60 – 62 2 40
57 – 59 2 38
54 – 56 4 36
51 – 53 5 32
48 – 50 11 27
45 – 47 8 16
42 – 44 4 8
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1

N = 60

Find Q1 P10 D2

Solution:
a. Q1 = XLB + [i(N/4 – F)/f]

Statistics Test Result


Class Frequency F <cf

60 – 62 2 40
57 – 59 2 38
54 – 56 4 36
51 – 53 5 32 0.25 x 40 =
48 – 50 11 27 10 is found in
45 – 47 8 16= percentile interval
42 – 44 4 8
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1

N = 40

Solution:

a. In order to remember only one formula, we can use Q1 = p25


n= 25% = 0.25 and 0.25 x 40 = 10

XLB = 44.5 f=8 F=8 i=3 N = 40

Q1 = P25 = XLB + [i(nN – F)/ f] = 44.5 + (3) [{(0.25)(40) – 8}/8]

= 44.5 + 0.75

= 45.25, therefore Q1 = 45.25

b. P10 = XLB + [i(nN – F)/f]

Statistics Test Result


Class Frequency F <cf
60 – 62 2 40
57 – 59 2 38
54 – 56 4 36
51 – 53 5 32 0.10 x 40=
48 – 50 11 27 4 is found in
45 – 47 8 16
42 – 44 4 8
39 – 41 2 4 = percentile interval
36 – 38 1 2
33 – 35 1 1

N = 40

Solution:

n = 10% F=2 P10 = XLB + [i(nN – F)/f]

0.10 x 40 = 4 i=3 = 38.5 + 3 [{(0.10)(40) – 2]/2

XLB = 38.5 N = 40 = 38.5 + 3 = 41.5

f=2

c. Solve for D2 using P20

D2 = P20 = XLB + [i(nN – F)/f]

Statistics Test Result


Class Frequency F <cf

60 – 62 2 40
57 – 59 2 38
54 – 56 4 36 0.20 x 40 = 8
51 – 53 5 32 found in
48 – 50 11 27
45 – 47 8 16
42 – 44 4 8 = percentile rank
39 – 41 2 4
36 – 38 1 2
33 – 35 1 1

N = 60

Solution:
n = 20%

0.20 x 40 = 8

XLB = 41.5

f=4

F=4 i = 3 N = 40

P20 = XLB + {i[(Nn – F)/f}

= 41.5 + 3 {[(0.20) x 40 – 4]/4} D2 = 44.5

= 41.5 + 3 = 44.5

EXERCISES: Solve for D3, D7, D9

Scores in Algebra f
75-79 6
70-74 7
65-69 2
60-64 8
55-59 12
50-54 7
45-49 10
40-44 8
N 60

Measure of Variation

The degree of variation measures the degree of the spread of the values. The measures of spread are
commonly called the measures of dispersion or measures of variation. There are six measures of
variation; the range, the quartile deviation, the interquartile range, the mean deviation, the variance
and the standard deviation. This is also used to determine how varied, dispersed, scattered, or distant
or how close, clustered or near performances of the members of the group are. It also describes the
heterogeneity and homogeneity of the group.

How to compute the range?

A. Range
The range is the difference between the highest score (h.s) and the lowest score (l.s). It gives
us the quickest estimate. It shows the two extreme scores of a set of data. For grouped data,
the range can be calculated by subtracting the lower boundary (l.b) of the lowest class
interval from the upper boundary (u.b) of the highest class interval.

Examples.
1. Find the range of the following data:
a. 10, 12, 12, 14 : R = 14 – 10 = 4
b. 80, 100, 100, 120 : R = 120 – 80 = 40
c. 45, 50, 50, 55 : R = 55 – 45

2. the range of the frequency distribution below.


Statistic Test Results
Class Frequency F
28-29 1
26-27 3
24-25 3
22-23 3
20-21 6
18-19 6
16-17 8
14-15 6
12-13 10
10-11 14
N=60

Recall: Range upper class boundary of the topmost class limit – lower class boundary of the bottom class
limit.

Solution:

Range = u.b – l.b

= 29.5 – 9.5

= 20

EXERCISES

1. Find the range:


1 5 6 9 11 15 17
2 5 7 9 12 15 17
3 5 7 9 12 15 17
4 6 8 12 10 16 18
4 6 9 12 11 16 18

2. Solve for the Range.

Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30

How to compute for the interquartile range and quartile deviation

B. The Interquartile Range


The Interquartile range is more reliable measure of variability. It is the difference between
the 75th percentile or Q3 and the 25th percentile or Q1 hence, the 50 percent will fall below Q1
and 25 percent will fall above Q3.
I.R = Q3 – Q1
Example: Solve for the interquartile range, given Q1 = 45.25 and Q3 = 52.3
Solution:
IR = 52. 3 – 45.25 = 7.05

EXERCISES

1. Solve for the IR, given Q1 = 69.81 and Q3 = 87.9


2. Compute the IR using the following data.

Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30

C. The Quartile Deviation


If we want to get half of the distances or interquartile range, then we simply divide the
difference between Q3 and Q2 by two. This value is called quartile deviation QD.
Q.D = (Q3 – Q1)/2
Example: Solve for the quartile deviation, given Q2 – 45.25 and Q3 = 52.3
Solution:
QD = (52.3 – 45. 25)/2 = 7.05/2 = 3.525
EXERCISES Solve for the QD .

Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30

How to compute for the mean deviation

D. The Mean Deviation


The mean deviation is a measure of variation that makes use of all the scores in a
distribution. This is more reliable than the range and quartile deviation.
a. Ungrouped Data
To solve the mean deviation for ungrouped data, we use the formula:

MD = Σ| X – Mn |
N

Where:
X = the score in the distribution
Mn = the mean
N = is the number of observation
Example. Find the mean deviation of the following ungrouped distribution: 4, 8, 12.
Solution:
a. Calculate the mean. Mn = 24/3 = 8
X |X – Mn |
4 4
8 0
12 4
Σ| X – Mn| = 8
MD = 8/3 = 2.67
b. Grouped Data
For group frequency distribution, the formula is,

MD = Σf| X – Mn| or MD = Σf| Xm – Mn |


N N

Example: Find the mean deviation of the following.


X f
30-34 4
25-29 5
20-24 6
15-19 2
10-14 3
N = 20

Solution:

Calculate the mean by using the formula, Mn = ΣfXm midpoint method, we add columns for Xm and fXm

X f Xm fXm

30-34 4 32 128

25-29 5 27 135

20-24 6 22 132

15-19 2 17 34

10-14 3 12 36

N =20 ΣfXm = 465

Mn = 465/20 = 23.25

Add the columns Xn – Mn and f X – Mn

X f Xm fXm Xn – Mn f Xn – Mn

30-34 4 32 128 8.75 35

25-29 5 27 135 3.75 18.75

20-24 6 22 132 1.25 7.50

15-19 2 17 34 6.25 12.50

10-14 3 12 36 11.25 33.75

N =20 Σf Xn – Mn = 107.50

MD = Σf Xn – Mn = 07.50 = 5.375

N 20
EXERCISE
Find the mean deviation.

1. 32, 35, 26, 15


2. Given
Class Interval F
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30
How to compute for the variance and the standard deviation

E. The Variance and the Standard Deviation


The standard deviation, SD is the most important and useful measure of variation. It is the
square root of the variance, SD2. It is an average to the degree to which each set of scores
in the distribution deviates from the mean value. It is a more stable measure of variation
because it involves all of the scores in a distribution rather than range and a.

Ungrouped data

For ungrouped data, the formula is S = √Σ(X – Mn)2]/N-1

a. Calculate for the mean.


b. Get the difference between each score and the mean, then square this
difference
c. Get the sum of the squared deviations in step b.
d. Substitute in the formula.

Example: Find the variance and the standard deviation of the following distribution.

Mn = (4+8+12)/3 X (x – Mn)2

4 16

8 0

12 16

Σ(X-Mn)2 = 32

S = √Σ(X-Mn)2]/N-1 = √[32]/3-1 = 4

S = 42 = 16

B. Grouped Data
For the data organized in a frequency distribution, the standard deviation is
computed this way:

S = √[Σf(X-Mn)2]/N-1

Example: Solve for the standard deviation and the variance.

C.I F Xm Xm – M n (Xm – Mn) (Xm – Mn)2

54 – 56 3 55 14.40 207.36 622.08

51 – 53 2 52 11.40 129.96 259.92

48 – 50 1 49 8.40 73.96 73.96

45 – 47 5 46 5.40 29.16 145.80

42 – 44 6 43 2.40 5.76 34.56

39 - 41 8 40 -0.60 0.36 2.88

36 – 38 4 37 -3.60 12.96 51.84

33 – 35 6 34 -6.60 43.56 261.36

30 – 32 2 31 -9.60 92.16 184.32

27 – 29 3 28 -12.60 158.76 476.28

N=40 2113.00

Solution: Solve for the mean1 Mn = 40.60

S = √Σf(X – Mn)2 |/N-1 = √[2113.00]/40-1 =7.36

S2 = 7.362 = 54.1696

EXERCISE Solve for the standard deviation.


1. 10.2, 13.7, 18.5, 20.8
Class Interval f
25-29 5
20-24 6
15-19 7
10-14 8
5-9 4
N 30

z-scores
z-scores (also knows as standard score) measures how many standard deviations an
observation is above or below the mean. A positive z-score measures the number of standard
deviations a score is above the mean, and a negative z score means the number of standard
deviations a score is below the mean. Z-score can be computed using the formula.
𝑥− µ
𝑧=
𝜎
Example 1: John got 76 marks in his Statistics test. If the marks of the whole class had a mean
of 52 and a standard deviation of 8, what was John’s standard score?
Solution:

76−52
Z= =3
8
Example 2: Given the mean of 55 and standard deviation of 8, what score corresponds to two
standard deviations above the mean?
Solution:
𝑥− 𝜇
z =
𝜎
𝑥−55
2=
8
16 = x – 55
x = 16 + 55 = 71
Example 3: Given the following data, in which subject did Roel perform poor?

Subjects Roel’s Score Mean Standard Deviation


Math 90 85 1.5
English 95 97 2.0
MAPEH 94 92 1.75

𝑥− 𝜇 90−85
𝑍𝑀𝐴𝑇𝐻 = = = 3.33
𝜎 1.5
95 − 97 2
𝑍𝐸𝑁𝐺𝐿𝐼𝑆𝐻 = = − = −1
2.0 2
94−92 2
𝑍𝑀𝐴𝑃𝐸𝐻 = = 1.75 = 1.14
1.75

Therefore, Roel performed poor in English because English has the smallest value of the z-
score.

EXERCISES

1. Complete the table.

X µ 𝜎 𝑧
1 9.5 5 3
2 -10 9 0.1
3 32.1 7 2.3
4 14 4.5 -0.7
5 -19 -7 -2.4

2. The standard score of Mary in Chines Test is 1.2 and her standard score in the
English Test is 0.8. In what subject did she perform better?
3. Kay and Ed’s result in a Grammar and a speech tests among 100 pupils are in the
table below:
Grammar Speech
Kay 82 65
Ed 77 70
Mean 65 70
Sd 15 10

a. In what subject Kay perform poor?


b. In what subject did Ed perform better?
c. Find the total standard score of the two students
d. Who has the best overall performance in the tests?

Box and Whisker Plot


A box and whisker plot (sometimes called boxplot) is a graph that presents
information from a five-number summary. It is a way of summarizing a set of data measured on
an interval scale.

Figure 1. Box and Whisker plot


BOX

WHISKER WHISKER

LOWEST HIGHEST

OBSERVATION LOWER median UPPER OBESERVATION

QUARTILE Q1 QUARTILE Q2

Example 1: Draw the box-and-whisker plot for the following data set:
77, 79, 80, 86, 87, 87, 94, 99
Solution:
Find the minimum value, Q1, Q2, Q3 and the maximum value.
min: 77 max: 99

Q2 = (86 + 87)/2 = 86.5

Subset 1: 77, 79, 80, 86 median: Q1 = (79+80)/2 = 79.5


Subset 2: 87, 87, 94, 99 median: Q3 = (87+94)/2 = 90.5
Therefore: min:77, Q1 = 79.5, Q2 = 86.5, Q3 = 90.5, max: 99

80 90 100

Example 2: Draw a box-and-whisker plot for the following data set and find the outliers.
4.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3, 4.8, 4.4, 4.2, 4.5, 4.4
Solution:
Arrange the values in order to find the median.
3.9, 4.1, 4.2, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
Median = Q2 = 4.4
Subset 1: 3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4 median=Q1 = (4.3+4.3)/2=4.3
Subset 2: 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1 median=Q3 = (4.7+4.8)/2=4.75
Therefore: min: 3.9, Q1 = 4.3,

You might also like