EDA Topic 3
EDA Topic 3
INSTRUCTIONAL MATERIAL #3
If one who works with statistics intends to have a set of quantitative measures to have a glimpse of the
form of distribution and the characteristics of the population from where the data were collected, he is to have
measures, which summarize such data. Further, he is to calculate a single number, which is typical of the general
level of magnitudes of the measurements in a set. Such single figure, which is the concentration point of scores, is
referred to as the measure of central tendency. The three common measures of central tendency are the mean,
the median and the mode.
THE MEAN
The arithmetic mean or simply mean (commonly called the average) is determined by adding the scores
together and the sum is divided by the number of scores.
Symbolically,
N
∑ Xi X 1 + X 2+ .. . X N
μ= i=1 =
N N
n
∑ Xi X 1 + X 2 +. . . X n
x= i=1 =
n n
where:
μ= population mean
x=sample mean
th
X i =the value of i observation
N= population ¿ ¿
n=sample ¿ ¿
Example 1. Here is an array of students’ scores in quiz. Compute the mean score.
∑ Xi 35+42+ …75
x= i=1 =
n 30
x=59
The sample mean that is denoted by X is generally employed as an estimator (or predictor) of m, the
mean of the population, which is unknown.
X
When n is small, the definition form x= ∑n can be used to compute the mean, but when n is large, say
50, 100 or more, the said method of determining the mean is not practical. A frequency distribution is made,
where it is possible to compute the statistical measures needed to describe the distribution.
Another method for calculating the mean is by multiplying each of the scores by the corresponding
frequency. Then add the products and divide by the number of scores. Here, consider a frequency distribution of
ungrouped data, which shows the number of times a variate, occurs. In symbols,
X 1 f 1 + X 2 f 2+ .. . X i f i
x=
f 1+ f 2 +…+ f i
where:
X i =number of different values of X ∈the set
f i=frequency of thecorresponding score
Example 2.
Xi fi fiXi
60 4 240
58 8 464
65 12 780
63 5 315
52 10 520
55 13 715
50 15 750
70 8 560
56 11 616
67 9 603
∑ f 1=95 ∑ f 1 X 1=5563
Each value of X i above is weighted as shown by the corresponding frequency, f i. The sum of all
products in the third column equals to 5563 and then divided by the total frequency, which is equal to 95, the
weighted arithmetic mean, x=58.56 .
This long method of computing the mean is conveniently used in finding the mean of not more than 30
cases.
For other types of data, weights may be denoted by symbols as W 1 , W 2 ,W 3 … W k , which may mean the
importance attached to the variates. So…
W 1 X 1 +W 2 X 2 +. .. W k X k ∑ WK
x= ∨
W 1 +W 2+ …+W k ∑W
Example 3. In the performance evaluation of teachers, if the supervisor’s evaluation is given a weight of
5, self-evaluation is 2, peer’s evaluation is 2 and client’s evaluation is 1 and the teacher’s rating is 90, 95, 85, and
90, the mean rating of the teacher would be:
5 ( 90 ) +2 ( 95 ) +2 ( 85 ) +1(90) 900
x= = =90
10 10
The short method in computing the mean is conveniently used for more than 30 cases because it involves
the use of small numbers.
30 33 54 44 53 49 46 44
32 35 57 43 56 50 45 43
31 34 51 39 52 49 46 42
28 33 52 41 51 45 47 44
27 34 53 36 48 42 37 38
M = AM +
∑ fd ci
n
where:
AM = Assumed mean
∑ fd = algebraic sum of the products of their frequencies by the corresponding deviations from
the assumed mean.
n = number of class
ci = class interval
1. Prepare a table having a step distribution column, midpoint column, frequency (f) column, deviation
column, and fd column;
2. Group the scores under the step distribution column using 3 for class interval (ci);
3. Fill the midpoint column;
4. Determine the step where the assumed mean will lie and enclose it with horizontal lines across the
width of the table.
5. For the assumed mean, one may select step interval regardless of the frequency. It could be at the
middle, at the bottom, or at the top in the distribution. However, to facilitate the computation,
the middle point is preferable.
42+44 86
AM = = =43
2 2
THE MEDIAN
The median is another measure of central tendency. It is a value that divides an array into two equal
parts. So, it is that point in a set of variates above which are an equal number of cases as there are below it. The
data must be arranged from highest to lowest or vice-versa.
[ ]
th
(n+1)
2
data value counted either from the highest or from the lowest of the distribution. When n is even, the median is the
[ ]
th th
n (n+1)
average between the ( ) score and the score. This is the arithmetic mean of the two values.
2 2
For example, consider these numbers: 34, 35, 36, 37, 38, 39, 40, 41 and 42. The median is 38 since the
number of items on either side is 4. Suppose the numbers were 35, 36, 37, 38, 38, 38, 40, 41, 42. The median
would still be 38 since it is the middle item in the set of values.
However, if the set of data is even in number, take the arithmetic mean of these two middle items to find
the median value.
35 36 37 38 39 40 41 42
This case is true when the variable being considered is viewed as a discrete variable. If the variable is
viewed as continuous, the median is calculated by the method used in calculating the median from grouped data
in the form of frequency distribution. For such the formula (Edwards, 1996) below is applied.
Md=L+
[ 0.5 n−f c
fm ]ci
where:
Md=median
L=exact lower limit of the class interval containing themedian .
f c =∑ of all frequencies below L
f m=frequency of the class interval containing the median
n=number of cases
ci=¿ class interval
Table 2. Computation of Median
Md=L+
[ 0.5 n−f c
fm ]ci
Md=37.5+
[ 17−15
8
1
]
INSTRUCTOR: ENGR. YVONNE ANGELYN R. ALIAS
5
MATH 122 : ENGINEERING DATA ANALYSIS
INSTRUCTIONAL MATERIAL #3
Md=37.75
Table 3. Computation of the Median from Grouped Data
The steps involved in computing the median from grouped data are as follows:
1. Determine the value, which divides the distribution into 2 equal parts. So, consider the “less than”
cumulative frequency. For the foregoing data, N X 0.5 = (40) (0.5) = 20
2. The 20th score is in the cumulative frequency 22. So, the median class is 42 – 44.
3. The exact lower limit of 42 – 44 is 41.5. this is the L.
4. The frequency of the median class or fm = 7.
5. The cumulative frequency below the median class or fc is 15.
6. The size off the interval or c = 3.
7. Substitute the given values in
Md=L+
[ 0.5 n−f c
fm ]ci
Md=41.5+
[ 20−15
7
3
]
Md=43.64
THE MODE
The Mode is a value in the data set, which occurs most frequently. It is the “most fashionable” value or
most popular or common score and the simplest measure of central tendency. Denoted by Mo, it can be
determined for both qualitative and quantitative data. It can be easily determined by inspection.
Crude Mode
Distributions may have one or more modes. A distribution that has only one mode is unimodal; that with
two modes is bimodal; with three is trimodal and so on. That with two or more modes is also called multi-modal.
Distribution A below is unimodal (38 mode), Distribution B is bimodal (36 and 38 are the modes), and the
Distribution C is trimodal (35, 36, and 38). these are the 3 modes.
Distribution A : 34, 35, 36, 38, 38, 38, 39, 40, 41, 42
Distribution B : 34, 35, 36, 36, 37, 38, 38, 39, 40, 41
Distribution C : 34, 35, 35, 36, 36, 37, 38, 38, 39, 40
True Mode
This can be estimated from the mean and the median. The formula used is Mo = 3Md – 2M.
As shown in the table1, the mean = 42. 93 and table 3, the median = 43.64. Since the class interval
corresponding to the maximum frequency is the modal class of the distribution, so the modal values is:
QUANTILES
The median divides the distribution into two equal parts. Going further, there are values, which can divide
the distribution into n equal parts.
The division can be into 100 equal parts called percentiles, denoted by P1 , P2 , … , P 9 … , into 10 equal
parts called deciles, denoted by D 1 , D 2 , … , D 9 …, and into four equal parts called quartiles, denoted by
Q1 ,Q2 , Q3.
Therefore, Md = Q 2=P50=D 5, the point in a distribution which has 50%of the items below it.
Q1=P25 … the point in a distribution which has 25% of the items below it.
Q3=P75 … the point in a distribution which has 75% of the items below it.
The first quartile is also the 25th percentile; the third quartile is the 75th percentile and the second quartile
is the 50th percentile.
For the ungrouped data, pick the values from the ordered set of data
PROVIDE EXAMPLE
( )
1
N −F1
4
Q1=L+ c
f1
( )
3
N −F3
4
Q3=L+ c
f3
Md=L+
( 0.5 N−F 2
f2
c
)
where:
F 3=the cumulative frequency “ less than” up ¿ the class immediately preceding the
th
3N
third quartile class . The third quartile class is the class which contains the item.
4
F 1=the cumulative frequency“ less than” up ¿ the class .The first quartile class isthe class
th
N
which contains the item .
4
L=lower limit
f =class frequency
c=class interval width
Computation of Decile for Grouped Data
One can derive formulas using the same procedure in calculating quartiles. This time, the
distribution is divided into 10 equal parts.
( )
N
−F 1
10
D1=L+ c
f1
( )
N
−F 2
5
D2=L+ c
f2
P p=L+ ( pNfp−F ) c
where:
P= percentage of the distribution wanted
L=exact lower limit of the class interval upon which P p lies
pN= part of N ¿ be counted off ∈order ¿ reach P p
F=∑ of all scores upon intervals below L
fp=no . of scores within theinterval upon which P p falls
c=¿ class interval
Example:
Find the first, second and third quartile in the distribution below:
Cumulative f
Class Limits f
(less than)
200-204 2 50
195-199 1 48
190-194 4 47
185-189 8 43
180-184 5 35
175-179 12 30
170-174 7 18
165-169 8 11
160-161 3 3
N=50
SOLUTION:
( )
1
N −F1
4
Q1=L+ c
f1
Q1=169.5+ ( 12.5−11
7 )5
Q1=170.57
Q2=Md =L+
( 0.5 N−F 2
f2 )
c
Q2=174.5+ ( 0.5(50)−18
12
5 )
Q2=177. 42
( )
3
N −F3
4
Q3=L+ c
f3
( )
3
(50)−35
4
Q 3=184.5+ 5
5
Q3=184.86