MATM 111 Data Management Dispersion and Normal Curve
MATM 111 Data Management Dispersion and Normal Curve
B.QUARTILES (Q)
A measure that divides set of data or distribution into 4 equal parts (Q 1 , Q2 , Q3 ,& Q4))
To find the kth quartile, follow the following steps: (for Ungrouped data)
1. Arrange the data in ascending order
2. Divide k by 4 then multiply by the sample size
k
Qk = n
4
3. If the computed value is non-integer, round up and take the data that corresponds in this position.
C. DECILE (D)
A measure that divides set of data or distribution into 10 equal parts (D1 , D2 , D3 ,…D10))
To find the kth decile, follow the following steps: (for Ungrouped data)
1. Arrange the data in ascending order
2. Divide k by 10 and multiply by the sample size.
k
` Dk = n
10
3. If the computed value is non-integer, round up and take the data that corresponds in this position
1
Example 1 : Given the following scores:
Scores: 11, 15, 16, 12, 9, 20, 18 , 19 , 14 , 13 , 20 , 11 , 19 21
25 15 17, 11
a) P15 = 11
k
Pk = 100 n = 15( 18 )/ 100 = 2.7 ≈ 3rd data ; P15 = 11
b) Q1 = 12
k
Qk = n = 1(18)/4 = 4.5 ≈ 5th data ; Q1 = 12
4
c) D7 = 19
k
Dk = n =7 ( 18) / 10 = 12.6 ≈ 13th data ; D7 = 19
10
nk / 100 fl
Formula: Pk= Lb. + i
f
nk / 4 fl
Qk= Lb. + i
f
nk / 10 fl
Dk= Lb. + i
f
2
MEASURE OF DISPERSION
- A measure of variation shows the extent to which numerical values tend to spread out over the average.
Measure of dispersion is also use to determine the consistency or homogeneity of set of data.
MEASURES OF DISPERSION
- quick but it is a poor measure of variation because it considers only the extremes data.
- is it tells nothing about the distribution of numbers in between.
2.) Mean Absolute Deviation (MAD)
- modification of range and more reliable than range.
3.) QUANTILE DEVIATION
Another way of measuring the variability of an observation is through quantile deviation, these are percentile
deviation, decile deviation, interquartile range, and quartile deviation.
Percentiles Deviation described the variations of the middle 80% of the data set.:
b. Decile Deviation
Same with the percentile, deviation, described also the variations of the middle 80% of the
data set.
0 1 2 3 4 5 6 7 8 9 10
80%
c. Inter-quartile Range
A measure of variation based on the quartiles of a distribution. Describe the variations of the middle 50% of the
data set.
I.R. = Q3 – Q1
3
d. Quartile Deviation
Q.D = Q3 - Q1
2
P10=100 =
𝑛𝑘 (18)10 2 nd score which is 21
= 1.8 or the 2
100
𝑛𝑘 (18)90
P90 =100 = 100
= 16.2 or the 17th score which is 44
= 44 – 21
= 23
𝑛𝑘 (18)1
D1 = = = 1.8 or the 2nd score which is 21
10 10
𝑛𝑘 (18)9
D9 = = = = 16.2 or the 17th score which is 44
10 10
*Decile deviation = D9 – D1
= 44 – 21
= 23
𝑛𝑘 (18)3
Q3 = = = = 13.5 or 14th score which is 36
4 4
𝑛𝑘 (18)1
Q1 = 4 = = = 4.5 or 5th score which is 26
4
* Interquartile deviation = Q3 – Q1
= 36 – 26
= 10
4
d.) To find the quartile deviation :
*quartile deviation = Q3 – Q1
= 36 – 26 = 5
The variance or mean square deviation is the expected value of the squared deviation of an observation
from its theoretical mean.
This value may be obtained by finding the square root of the calculated variance and being considered as the
most reliable measure of dispersion.
The variance and standard deviation are based on all items in the data set and each item is given a proper
weight. These two are very useful measures of variability because it measures the mean scattering of the data
around the average. The variance and the standard deviation increase with an increase in the deviation about the
mean, and decreases with decreases in these deviation. A small standard deviation (and variance) means a high
degree of uniformity in the observations and homogeneity in the distribution. The variance is the most suitable for
algebraic manipulations, but its computation results are in squared units. On the other hand, the standard deviation
has a value in the original units of data. Thus, it serves as the primary just as mean as the primary measure of
central tendency.
The standard deviation, however, has its set of limitations. It gives more weight to the extreme data and less
to those near the mean and the computation is not as easy as the range. This measure is not appropriate when
comparing two or more data sets in different units or different levels.
5
Formula:
Dispersion
Ungrouped Grouped
R = Hs - Ls R = UBLHCL - LBLLCI
Range(R)
∑(𝑥−𝜇)2 ∑ 𝑓(𝑥−𝜇)2
𝜎2= 𝜎2=
𝑁 𝑁
Variance
(Var)
∑(𝑥−𝑥̅ )2 ∑ 𝑓(𝑥−𝑥̅ )2
s2= s2=
𝑛−1 𝑛−1
sd =√𝑣𝑎𝑟
Standard Deviation
(s , sd) a. For Population Standard Dev.(σ)
σ= √σ2
Note: Notice that the sample variance and population variance have a different denominator. The sample variance
uses n-1 in the denominator instead of n. Statistical theory suggests that if there are many samples from the given
population, find the sample variance for each sample, and average each of these together, then this average will
equal the population variance when use n-1 in the denominator.
The accuracy and position of the score in the frequency distribution relative to the mean can be determined by
using the Chebyshevs’ Theorem.
1
The proportion or percentage of any data set that lie within k standard deviations of the mean is at 1-
k2
Note: k is any positive integral greater than 1 The theorem is applied to any distribution
a. At least 75% of the observation are within 2 std. deviation of its mean
b. At least 88.9% of the observation are within 3 std. deviation of its mean.
6
Example: The midterm exam score of 50 BS PSY students last prelim had mean score of 55 and standard deviation of
12 points.
a.) At least 75% of the students had score between 31 and 79.
b.) At least 88.9% of the students had score between 19 and 91.
Example1: Suppose you have sample scores of 70, 85, 80, 90, and 75. Solve for the range, variance and standard
deviation.
Solution:
a. R = Hs – Ls = 90 – 70
(x–𝑥̅ ) (x-𝑥̅ )2
R = 20 Scores (x)
∑(𝑥−𝑥̅ )2
75 (75 -80) =5 25
s2= n−1
80 (80 -80) =0 0
Solve for mean first,
85 (85 -80) =5 25
∑𝑥 400
𝑥̅ = = = 80
𝑛 5 90 (90 -80) =10 100
∑(𝑥−𝑥̅ )2 250 250
S2= = = = 400 250
n−1 5−1 4
S2 = 62.5
c. Standard Deviation
s = √𝑠2 = 62.5
s = 7.90
7
Example 2:) Consider the following set of observations. Four students took exams in a certain university:
Subject
Student
Eng Fil Sci Math Abstract
A 20 24 26 22 23
B 18 26 28 20 23
C 30 24 22 19 20
D 27 25 20 19 24
̅𝑨 = _________
𝒙 _𝒙
̅𝑩 = __________
𝒙𝑪 = _________
̅ _𝒙𝑫 = __________
̅
The mean score of student A is 23, student B is 23, student C is 23 and student D is 23. It shows that all of them have
the same average scores.
8
State your analysis:
The computed standard deviation of student A is 2.24, student B is 4.12, student C is 4.36 and student D is 3.39. It
shows that student A has the lowest computed sd.
Since that student A has the lowest computed sd, among the four students, student A has the most consistent scores
Note: In comparing 2 or more sets of data using standard deviation, the set that has the lowest computed sd is the set
that can be concluded that has the most consistent or homogeneous elements among the sets.
The mathematical equation of the normal curve is derived by -3s -2s -1s =0 1s 2s 3s
De Moivre in 1733. An extensive study was made by Gauss, thus, it is
sometimes called the Gaussian distribution in his honor.
It is completely specified by two parameters called the mean, µ, and the standard deviation s.
1.) The tails of the curve are asymptotic to the horizontal axis.
2.) The curve is symmetric to the Mean ( x ).
3.) The Mean, Median and Mode have the same value and are therefore located at the same point (center) along
the horizontal axis.
4.) The area under the normal curve is 1 (100%).
5.) The curve is divided into 3 standard deviation.
Note:
1.) The normal curve is a theoretical model, a kind of Frequency Polygon that is perfectly symmetrical and
smooth.
2.) The area under normal curve represents probability.
9
Example 1.) Suppose a quiz in a class of 50 students consisting of 25 boys and 25 girls has the
following results:
Boys: x = 78 sd = 3 n = 25 Girls: x = 78 sd = 4 n = 25
69 72 75 78 81 84 87 66 70 74 78 82 86 90
To find or determine the proportion of the total area greater than, in between or less than an empirical value,
we should use a standardizing technique to transform the original score into units of standard deviation. We shall
convert original scores into standard score or z-score. This means that the empirical distribution will be standardized
into the theoretical normal curve. The standardized theoretical normal curve has a mean of 0 ( x = 0) and standard
deviation of 1
(sd= 1).
Formula:
Where:
xx
Z= x – empirical value
sd
x - mean
sd – standard deviation
10
Application of the Normal Curve
1.) In statistics examination, the mean grade is 78 and the standard deviation is 10. Find the ff:
a.) The corresponding z-score of two students whose grade are 93 and 62 respectively,
Solution:
b.) The grades of two students whose z-scores are 0.6 and 1.2 respectively.’
Solution:
3.) Find the area under the normal curve which lies:
11
d. to the left of z = 1.35
Solution:
5.)If adult male cholesterol is normally distributed with µ = 200 and σ = 25, what is the probability of selecting a male
whose cholesterol is;
12
7.) A group test was administered and has the ff. results:
Physics 60 10 85
Mat 3 55 9 58
Eng 3 45 8 40
a.) What percentage of the students taking the test did Zac surpass in each subject?
b.) If there are 200 students who took the test, how many of them did he surpass in each subject?
Solutions:
Physics:
SKEWNESS
The shape of the graph of the distribution is another important property of the distribution. Either the data is
symmetrical (with bell- shaped curve) or not. If the set of data is not symmetrical, it is called asymmetrical or skewed.
A skewed distribution is the one that has the largest number of values on either the right or left
Where:
3( x Md )
Skewness (Sk) = x = the sample mean,
s
Md = the sample median
Types of Skewness
13
Mo Md 𝑥̅ 𝑥̅ Md Mo
Note:
For a perfect normal distribution, Sk = 0. When index sk > 0, the distribution is positively skewed. When index
sk < 0, the distribution is negatively skewed.
KURTOSIS
Another property to describe the shape of the frequency distribution curves is kurtosis. It describes the extent
of peakedness or flatness of the distribution of the data. This can be measured by coefficient of kurtosis (k). The
kurtosis for a mesokurtic curve is 3, greater than 3 for a leptokurtic and less than 3 for a platykurtic
Measure of Kurtosis
(x x) 4
f (x x)
i i
4
Kurtosis(K)=
i
s 4
, Kurtosis(k) =
i
s4
n n
Where
Where
xi = value of the ith observation,
i = ith class interval,
x the sample mean and
xi = classmark of the ith class interval,
n = the sample size.
x sample mean
n = sample size..
Types of Kurtosis
Leptokurtic
Platykurtic
14
College of Arts & Sciences Mathematics & Physics Department
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3032 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
15