0% found this document useful (0 votes)
39 views10 pages

EDA Topic 3

The document discusses different methods for calculating the mean or average of a data set, including the arithmetic mean, weighted arithmetic mean, and finding the mean for grouped data. It provides examples of calculating the mean for both ungrouped and grouped data.

Uploaded by

Julimar Cabaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views10 pages

EDA Topic 3

The document discusses different methods for calculating the mean or average of a data set, including the arithmetic mean, weighted arithmetic mean, and finding the mean for grouped data. It provides examples of calculating the mean for both ungrouped and grouped data.

Uploaded by

Julimar Cabaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

MATH 122 : ENGINEERING DATA ANALYSIS

INSTRUCTIONAL MATERIAL #3

If one who works with statistics intends to have a set of quantitative measures to have a glimpse of the
form of distribution and the characteristics of the population from where the data were collected, he is to have
measures, which summarize such data. Further, he is to calculate a single number, which is typical of the general
level of magnitudes of the measurements in a set. Such single figure, which is the concentration point of scores, is
referred to as the measure of central tendency. The three common measures of central tendency are the mean,
the median and the mode.

THE MEAN

The Simple Arithmetic Mean for Ungrouped Data

The arithmetic mean or simply mean (commonly called the average) is determined by adding the scores
together and the sum is divided by the number of scores.

Symbolically,
N

∑ Xi X 1 + X 2+ .. . X N
μ= i=1 =
N N
n

∑ Xi X 1 + X 2 +. . . X n
x= i=1 =
n n
where:
μ= population mean
x=sample mean
th
X i =the value of i observation
N= population ¿ ¿
n=sample ¿ ¿
Example 1. Here is an array of students’ scores in quiz. Compute the mean score.

35, 42, 45, 48, 49, 50, 51, 52, 53, 55


55, 56, 57, 57, 57, 60, 61, 62, 64, 64
65, 65, 68, 69, 70, 71, 71, 72, 73, 75
Solution:
n

∑ Xi 35+42+ …75
x= i=1 =
n 30

x=59
The sample mean that is denoted by X is generally employed as an estimator (or predictor) of m, the
mean of the population, which is unknown.
X
When n is small, the definition form x= ∑n can be used to compute the mean, but when n is large, say
50, 100 or more, the said method of determining the mean is not practical. A frequency distribution is made,
where it is possible to compute the statistical measures needed to describe the distribution.

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


1
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

Weighted Arithmetic Mean

Another method for calculating the mean is by multiplying each of the scores by the corresponding
frequency. Then add the products and divide by the number of scores. Here, consider a frequency distribution of
ungrouped data, which shows the number of times a variate, occurs. In symbols,

X 1 f 1 + X 2 f 2+ .. . X i f i
x=
f 1+ f 2 +…+ f i

where:
X i =number of different values of X ∈the set
f i=frequency of thecorresponding score

Example 2.

Xi fi fiXi
60 4 240
58 8 464
65 12 780
63 5 315
52 10 520
55 13 715
50 15 750
70 8 560
56 11 616
67 9 603
∑ f 1=95 ∑ f 1 X 1=5563
Each value of X i above is weighted as shown by the corresponding frequency, f i. The sum of all
products in the third column equals to 5563 and then divided by the total frequency, which is equal to 95, the
weighted arithmetic mean, x=58.56 .
This long method of computing the mean is conveniently used in finding the mean of not more than 30
cases.
For other types of data, weights may be denoted by symbols as W 1 , W 2 ,W 3 … W k , which may mean the
importance attached to the variates. So…

W 1 X 1 +W 2 X 2 +. .. W k X k ∑ WK
x= ∨
W 1 +W 2+ …+W k ∑W

Example 3. In the performance evaluation of teachers, if the supervisor’s evaluation is given a weight of
5, self-evaluation is 2, peer’s evaluation is 2 and client’s evaluation is 1 and the teacher’s rating is 90, 95, 85, and
90, the mean rating of the teacher would be:

5 ( 90 ) +2 ( 95 ) +2 ( 85 ) +1(90) 900
x= = =90
10 10

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


2
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

Finding the Mean for Grouped Data

The short method in computing the mean is conveniently used for more than 30 cases because it involves
the use of small numbers.

Example 1. Find the mean of the following test scores

30 33 54 44 53 49 46 44
32 35 57 43 56 50 45 43
31 34 51 39 52 49 46 42
28 33 52 41 51 45 47 44
27 34 53 36 48 42 37 38

Table 1. Computation of the Mean of Tests Scores of Forty College Students

Class Interval Midpoint f d fd


57 – 59 58 1 5 5
54 – 56 55 2 4 8
51 – 53 52 6 3 18
48 – 50 49 4 2 8
45 – 47 46 5 1 5
42 – 44 43 7 0 0
39 – 41 40 2 -1 -2
36 – 38 37 3 -2 -6
33 – 35 34 5 -3 -15
30 – 32 31 3 -4 -12
27 – 29 28 2 -5 -10
N = 40 ∑ fd =−1

The formula for computing the mean is

M = AM +
∑ fd ci
n
where:
AM = Assumed mean
∑ fd = algebraic sum of the products of their frequencies by the corresponding deviations from
the assumed mean.
n = number of class
ci = class interval

The steps followed are:

1. Prepare a table having a step distribution column, midpoint column, frequency (f) column, deviation
column, and fd column;

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


3
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

2. Group the scores under the step distribution column using 3 for class interval (ci);
3. Fill the midpoint column;
4. Determine the step where the assumed mean will lie and enclose it with horizontal lines across the
width of the table.
5. For the assumed mean, one may select step interval regardless of the frequency. It could be at the
middle, at the bottom, or at the top in the distribution. However, to facilitate the computation,
the middle point is preferable.

42+44 86
AM = = =43
2 2

6. Fill the f column and get the total n, which is 40.


7. Fill the d column beginning from the step where the assumed mean lies, give this a 0 deviation.
Number the steps from 0 upward 1,2,3,4,5 and so on using positive numbers. Below 0, number steps
downward -1, -2, -3 and so on using negative numbers.
8. Multiply the frequency by the deviation for each step to get the fd column. Get their summation, which
is equal -1. This is the algebraic sum of fd. Divide the summation fd by n and multiply by the
class interval c:
∑ fd c= (−1 ) (3) =−0.075
n 40
This is the correction for the assumed mean.
9. Add the correction value of -0.075 to the assumed mean value of 43.

X = 43 + (-0.075) = 42.925 or 42.93

THE MEDIAN

The median is another measure of central tendency. It is a value that divides an array into two equal
parts. So, it is that point in a set of variates above which are an equal number of cases as there are below it. The
data must be arranged from highest to lowest or vice-versa.

Computation of Median for Ungrouped Data


To find the median, t is necessary to arrange all the items in a distribution in either ascending or
descending order and pick out the middle item with its corresponding value. When the set of data value is odd
number, the median is the

[ ]
th
(n+1)
2

data value counted either from the highest or from the lowest of the distribution. When n is even, the median is the

[ ]
th th
n (n+1)
average between the ( ) score and the score. This is the arithmetic mean of the two values.
2 2

For example, consider these numbers: 34, 35, 36, 37, 38, 39, 40, 41 and 42. The median is 38 since the
number of items on either side is 4. Suppose the numbers were 35, 36, 37, 38, 38, 38, 40, 41, 42. The median
would still be 38 since it is the middle item in the set of values.
However, if the set of data is even in number, take the arithmetic mean of these two middle items to find
the median value.
35 36 37 38 39 40 41 42

Median = (38 + 39) / 2 = 77/2 = 38.5

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


4
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

This case is true when the variable being considered is viewed as a discrete variable. If the variable is
viewed as continuous, the median is calculated by the method used in calculating the median from grouped data
in the form of frequency distribution. For such the formula (Edwards, 1996) below is applied.

Md=L+
[ 0.5 n−f c
fm ]ci

where:
Md=median
L=exact lower limit of the class interval containing themedian .
f c =∑ of all frequencies below L
f m=frequency of the class interval containing the median
n=number of cases
ci=¿ class interval
Table 2. Computation of Median

Scores Frequency Cumulative


Frequency
42 3 34
41 1 31
40 5 30
39 2 25
38 8 23
37 7 15
36 3 8
35 4 5
34 1 1
N = 34

To calculate the median:

1. Compute the cumulative frequencies.


2. Multiply the total number of cases by 0.5 so 34 x 0.05 = 17.0.
3. Determine where m2 score or the median lies. This is the smallest cumulative frequency where n2 lies.
In our example the 17th score is 23, which is across the class 38. So, this class contains the
median. The exact lower limit of this class is 37.5, which is L in the formula.
4. Determine the cumulative frequency of the class immediately below the median class, 38. In example,
it is 37, which corresponds to a cumulative frequency 15. This is the cf.
5. Determine the frequency of the median class 38. This is 8, which is fm.
6. Determine the interval size, ci. In our example, it is only 1 since the scores are not grouped in
categories.
7. Substitute all values needed in the formula.

Md=L+
[ 0.5 n−f c
fm ]ci

Md=37.5+
[ 17−15
8
1
]
INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS
5
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

Md=37.75
Table 3. Computation of the Median from Grouped Data

Scores Frequency f Cumulative


Frequency
57-59 1 40
54-56 2 39
51-53 6 37
48-50 4 31
45-47 5 27
42-44 7 22
39-41 2 15
36-38 3 13
33-35 5 10
30-32 3 5
27-29 2 2

The steps involved in computing the median from grouped data are as follows:

1. Determine the value, which divides the distribution into 2 equal parts. So, consider the “less than”
cumulative frequency. For the foregoing data, N X 0.5 = (40) (0.5) = 20
2. The 20th score is in the cumulative frequency 22. So, the median class is 42 – 44.
3. The exact lower limit of 42 – 44 is 41.5. this is the L.
4. The frequency of the median class or fm = 7.
5. The cumulative frequency below the median class or fc is 15.
6. The size off the interval or c = 3.
7. Substitute the given values in

Md=L+
[ 0.5 n−f c
fm ]ci

Md=41.5+
[ 20−15
7
3
]
Md=43.64
THE MODE

The Mode is a value in the data set, which occurs most frequently. It is the “most fashionable” value or
most popular or common score and the simplest measure of central tendency. Denoted by Mo, it can be
determined for both qualitative and quantitative data. It can be easily determined by inspection.

Crude Mode

Distributions may have one or more modes. A distribution that has only one mode is unimodal; that with
two modes is bimodal; with three is trimodal and so on. That with two or more modes is also called multi-modal.
Distribution A below is unimodal (38 mode), Distribution B is bimodal (36 and 38 are the modes), and the
Distribution C is trimodal (35, 36, and 38). these are the 3 modes.

Distribution A : 34, 35, 36, 38, 38, 38, 39, 40, 41, 42

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


6
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

Distribution B : 34, 35, 36, 36, 37, 38, 38, 39, 40, 41

Distribution C : 34, 35, 35, 36, 36, 37, 38, 38, 39, 40

True Mode

This can be estimated from the mean and the median. The formula used is Mo = 3Md – 2M.

Table 4. Computation of the True Mode

Class Interval Frequency f


57-59 1
54-56 2
51-53 6
48-50 4
45-47 5
42-44 7
39-41 2
36-38 3
33-35 5
30-32 3
27-29 2

As shown in the table1, the mean = 42. 93 and table 3, the median = 43.64. Since the class interval
corresponding to the maximum frequency is the modal class of the distribution, so the modal values is:

True Mode = 3 (43.64) – 2 (42.93) = 45.06

QUANTILES

The median divides the distribution into two equal parts. Going further, there are values, which can divide
the distribution into n equal parts.
The division can be into 100 equal parts called percentiles, denoted by P1 , P2 , … , P 9 … , into 10 equal
parts called deciles, denoted by D 1 , D 2 , … , D 9 …, and into four equal parts called quartiles, denoted by
Q1 ,Q2 , Q3.
Therefore, Md = Q 2=P50=D 5, the point in a distribution which has 50%of the items below it.

Q1=P25 … the point in a distribution which has 25% of the items below it.
Q3=P75 … the point in a distribution which has 75% of the items below it.

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


7
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

The first quartile is also the 25th percentile; the third quartile is the 75th percentile and the second quartile
is the 50th percentile.

Computation of the Quartiles

Computation of Quartiles for Ungrouped Data

For the ungrouped data, pick the values from the ordered set of data

PROVIDE EXAMPLE

Computation of Quartiles for Grouped Data

To determine the first quartile, Q 1

( )
1
N −F1
4
Q1=L+ c
f1

To determine the third quartile, Q 3

( )
3
N −F3
4
Q3=L+ c
f3

To determine the second quartile, Q 2or the median

Md=L+
( 0.5 N−F 2
f2
c
)
where:
F 3=the cumulative frequency “ less than” up ¿ the class immediately preceding the
th
3N
third quartile class . The third quartile class is the class which contains the item.
4
F 1=the cumulative frequency“ less than” up ¿ the class .The first quartile class isthe class
th
N
which contains the item .
4
L=lower limit
f =class frequency
c=class interval width
Computation of Decile for Grouped Data

One can derive formulas using the same procedure in calculating quartiles. This time, the
distribution is divided into 10 equal parts.

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


8
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

To determine the first decile, D 1

( )
N
−F 1
10
D1=L+ c
f1

To determine the second decile, D 2

( )
N
−F 2
5
D2=L+ c
f2

Computation of Percentile for Grouped Data

P p=L+ ( pNfp−F ) c
where:
P= percentage of the distribution wanted
L=exact lower limit of the class interval upon which P p lies
pN= part of N ¿ be counted off ∈order ¿ reach P p
F=∑ of all scores upon intervals below L
fp=no . of scores within theinterval upon which P p falls
c=¿ class interval
Example:

Find the first, second and third quartile in the distribution below:

Cumulative f
Class Limits f
(less than)
200-204 2 50
195-199 1 48
190-194 4 47
185-189 8 43
180-184 5 35
175-179 12 30
170-174 7 18
165-169 8 11
160-161 3 3
N=50

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


9
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3

SOLUTION:

( )
1
N −F1
4
Q1=L+ c
f1

Q1=169.5+ ( 12.5−11
7 )5
Q1=170.57

Q2=Md =L+
( 0.5 N−F 2
f2 )
c

Q2=174.5+ ( 0.5(50)−18
12
5 )
Q2=177. 42

( )
3
N −F3
4
Q3=L+ c
f3

( )
3
(50)−35
4
Q 3=184.5+ 5
5

Q3=184.86

INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS


10

You might also like