Descriptive Statistics: Atistics
Descriptive Statistics: Atistics
Descriptive Statistics: Atistics
Through the centuries man has been interested in determining the number of birth rate,
mortality rate, population, occurrence of certain events at a certain period of time, trade, crop
yield, figures on tax returns, and the frequency of failures in school entrance test, among many
others.
In these activities, it deals with the counts or numeral measures of activities, events and
things, which are called “statistics” in a limited sense.
Today the use of statistics has extended to such things as theatre attendance, basketball
results, car sales, heights, weights, and so many others that can be expressed numerically.
In counting activities, events and things, the measurement that are collected from the
original information are called raw data.
These data may be treated by statistical methods that are used to describe (descriptive
statistics), to relate or associate, and to make inferences (Inferential statistics).
STATISTICS
3. Analysis of Data – pertains to the process of extracting from the given data relevant
information from which numerical description can be formulated. Resolution of
information into simpler elements by the application of statistical principles. This may
involve the use of any method of statistics the choice of which depends upon the nature
or purpose of the research problem.
4. Interpretation of Data – refers to the task of drawing conclusions from the analysed
data.
It is wise to remember that your choice of the statistics to use in testing hypothesis
depends upon the nature of your data.
Descriptive Statistics
- is concerned with the gathering, classification, and presentation of data and the
collection of summarizing values to describe group characteristics of the data. The summarizing
values most commonly used in descriptive statistics are the measures of central tendency,
variability, and of skewness and kurtosis.
Inferential Statistics
Sample - the few members of the population to represent their characteristics or traits,
the representative part of the population, used to describe the population from which it was
taken.
The VARIABLE
The term variable refers to the characteristic or property whereby the members of the
group or set vary or differ from one another. For instance, the members of a group may vary in
sex, age, colour, attitude, intelligence and others. Labels or numerals may be used to name a
variable.
Variables are classified into independent and dependent with respect to their functional
relationship. For example: if you treat variable y as a function of variable x , then x is your
independent variable and y is your dependent variable. This means that the value of y (say
Academic Achievement) depends on the value of x (say Mental Ability).
Example:
Sex (male and female), Marital Status (single, married, etc.), Religious Affiliation
(catholic, SDA, protestant, etc.). This groups formed can be identified by using numbers like 0 or
1. Since these numbers are merely used for identification purposes, no meaning can be
attached to the magnitude or size of such numbers.
Other examples of the use of nominal variable are assignments of numbers to basketball
players, houses, office rooms, and telephones, of which these numbers are just labels or codes
for categories.
Many statistical writers use the word qualitative to describe nominal variables which
cannot be ordered.
2. ORDINAL variable/measurement
Ordinal variable do not only classify but also order the classes. This is a property defined
by an operation whereby members of a particular group are ranked.
Examples of ordinal variables are the ranks 1, 2, 3, 4, 5 given by judges to the five
finalists in a beauty contest. Ordinal variable can be ordered but unable to determine the
degree of difference between any consecutive ordinal measures.
(other examples: very high (VH), high (H), above average (AA), below average (BA), low
(L), very low (VL).
3. INTERVAL variable/measurement
4. RATIO variable/measurement
This variable differ from interval variable only in one aspect. It has a true zero value
which indicates a total absence of the property being measured.
Example: 0°C is different from 0 K. Other examples are weight, length, age, and number
of children in a family. With ratio measurement, multiplication and division have meanings.
Subscript - is a number or letter representing several numbers placed at the lower right
of the variable.
Where, i = 1, 2, 3, 4, and 5
Summation : ∑
Example:
n
ΣXi
i=1
Summation of Xi where i is 1 to n. This indicates that we get X1+X2+X3+. . . . Xn
The subscript i may be omitted together with n if no ambiguity will result. Thus we may
write ∑X to mean the sum of all X, ∑X = X 1+X2+X3+ . . . . .Xn or ∑XY means the sum of all the
products of XY, ∑XY = X1Y1 + X2Y2 + X3Y3 . . . . . XnYn
Used when the objective is to determine the cause and effect relationship of certain
phenomena under controlled conditions.
PRESENTATION OF DATA
Data collected must be organized in order to show significant characteristics. They can be
presented in three (3) forms.
1. Textual Presentation
Where data are presented in paragraph form. Example: As measured by the personal
consumption expenditures in the Gross National Product, consumers spent only P38.8 billion
this year (2008), up a meager 6.2 percent from P36.53 billion last year (2007). The previous year
(2006) personal consumption expenditures were likewise sluggish, growing by only 5.74
percent.
2. Tabular presentation
Tabular presentation is where data are presented in rows and columns. Tabulation is the
process of condensing classified data and arranging them in table. The arrangement of
the gathered data by categories plus their corresponding class frequencies and class
marks or midpoint is called Frequency Distribution Table (FDT).
FREQUENCY DISTRIBUTION TABLE (FDT)
The table should be headed by a number and a title to give the reader an idea of the
nature of the data being organized.
Given the following scores of fourth year students in Math test, make a frequency
distribution table.
Raw Scores: 25 31 25 29 31 25 40 35 26 34 40 27 38 32 39
Score Frequency
40 2
39 1
38 1
37 0
36 0
35 1
34 1
33 0
32 1
31 2
30 0
29 1
28 0
27 1
26 1
25 3
Total 15
These ranges of the variables are known as “class intervals”. The data are condensed
into a fewer number of groups or categories. Class interval is a grouping or category defined by
a lower and upper limits.
Example: An arbitrary grouping such as 9-12 is called a class interval or simply a class. The end
numbers 9 and 12 are called class limits, lower and upper limits, respectively.
But the real or exact lower limit is 0.5 below the lower limit and the real or exact upper
limit is 0.5 above the upper limit.
In a class interval of 9-12, the exact lower limit is 8.5 and the exact upper limit is 12.5.
Construct a frequency distribution table (FDT) given the following data or scores.
47 56 42 28 56 65 53 82 42 62 57 38
62 52 66 39 55 47 54 55 54 48 52 47
44 56 37 42 50 62 48 56 42 60 41 72
48 78 68 68 N=40
R = 82 – 28 = 54
Determine the Class size/Interval size or the Class Interval size, (i) by dividing the range
by the desired number of classes. The normal or ideal number of class intervals ranges from 8
to 15. Therefore divide the range, R, in such a way that the resulting number of class intervals
fall from 8 to 15.
i = 54 ÷ 11
i = 4.9
i=5
Determine or organize the class limits of the class intervals. Tabulation is facilitated if
the lower class limits of the class intervals are multiples of the class size, and the bottom
interval must include the lowest score.
In our data, the lowest score is 28 and the interval size, i, is 5. Score 28 is not divisible by
5, therefore, the bottom class interval will start at 25 lower limit and 29 as upper limit. So, the
bottom class interval is 25-29, which contains the lowest score 28.
Class Interval
80-84
75-79
70-74
65-69
60-64
55-59 There are 12 categories (ideal number is 8-15)
50-54 The lower limits are divisible by the class size, i.
45-49
40-44
35-39
30-34
25-29
4. Tally the scores to the category of class interval it belongs. Count the tally column and
summarize under frequency, f, column and add. The sum is equal to number of cases, N.
Class Interval
Tally f
80-84
I 1
75-79
I 1
70-74
I 1
65-69
IIII 4
60-64
IIII 4
55-59
IIIII-II 7
50-54
IIIII-I 6
45-49
IIIII-I 6
40-44
IIIII-I 6
35-39
III 3
30-34
0
25-29
I 1
N=40
f
Tally f 1 GDD, Panabo National High School 10
Tally I 1 1 Midpoint >cf
Midpoint
5. Determine the midpoints
II by averaging
1 1 the
82lower limits and
<cf the upper
1 limits.
82
Class Interval II 1 4 77 40 2
77
80-84 I
IIII 4 4 72 39 3
72
75-79 IIII
IIII 4 7 67 38 7
67
70-74 IIII
IIIII-II 7 6 62 37 11
62
65-69 IIIII-II
IIIII-I 6 6 57 33 18
57
60-64 IIIII-I
IIIII-I 6 6 52 29 24
52
55-59 IIIII-I
IIIII-I 6 3 47 22 30
47
50-54 IIIII-I
III 3 0 42 16 36
42
45-49 III 0 1 37 10 39
37
40-44 I 1N=40 32 4 39
32
35-39 I N=40 27 1 40
27
30-34 1
25-29
6. Determine the cumulative frequency for “less than” (<cf) and “greater than” (>cf).
Class Interval
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
Class Interval
Tally f Midpoint <cf >cf RF (%)
80-84
I 1 82 40 1 2.5
75-79
I 1 77 39 2 2.5
70-74
I 1 72 38 3 2.5
65-69
IIII 4 67 37 7 10
60-64
IIII 4 62 33 11 10
55-59
IIIII-II 7 57 29 18 17.5
50-54
IIIII-I 6 52 22 24 15
45-49
IIIII-I 6 47 16 30 15
40-44
IIIII-I 6 42 10 36 15
35-39
III 3 37 4 39 7.5
30-34
0 32 1 39 0
25-29
I f 1Midpoint 27 1
>cf 40(%)
RF 2.5
Tally N=40 100%
<cf
1 82 1 2.5
Final Table I 1 77 2 2.5
40
I 1 72 Table
39 5 3 2.5
I
Grouped Frequency, Cumulative
4 & Relative Frequency
67 Distribution
7 of10the Statistics Test Scores
38
IIII 4 62 of 40 M.A. 11 10
37 Students
IIII 7 57 18 17.5
33
IIIII-II 6 52 24 15
Class 29
IIIII-I 6 47 30 15
Interval 22
IIIII-I 6 42 36 15
80-84 16
IIIII-I 3 37 39 7.5
75-79 10
III 0 32 39 0
70-74 4
1 27 1 40 2.5
65-69 I N=40 1 100%
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
_____________________________________________________________________________
Exercise: Based on the raw data given, construct a grouped frequency distribution table.
Given data: 30 43 33 51 67 50 30 48 48 39
65 37 42 24 53 80 61 49 46 28
47 49 52 42 40 50 42 47 61 68
70 38 20 56 75 18 38 29 58 59
N = 40
ASSIGNMENT:
1. Construct a 3x2 frequency distribution table for nominal data of LRC students,
SY 2020-2021. Make your own table title.
2. Construct a 4x2 frequency distribution table for nominal data of Panabo National
High School Teachers, SY 2020-2021. Make your own table title.
3. Construct two frequency distribution tables for ordinal data. Use sections Fauna and
Flora in the collection of your data. At least there are five categories in a
characteristic being presented. Make your own table title.
4. Construct a frequency distribution table for interval data (ungrouped). Provide your
own data variable. Make your own table title.
5. Construct a combined frequency distribution table for nominal data and interval data
(ungrouped). Make your own table title.
44 45 48 37 40 43 45 45 41 38 46
50 45 41 32 34 55 47 48 50 43 53
42
N = 34
Graph is a geometric or pictorial image of a set of data (picture or numerical data). The
data can be graphically presented according to their scale or level of measurements. Graphical
presentations are:
@ Graphs enable readers to easily grasp the essential facts that numerical data intend to convey.
@ Graphs can easily attract attention and are more readily understood. It can be bolstered by the
use of colors and pictorial diagrams.
@ Graphs simplify concepts that would otherwise have been expressed in so many words.
In constructing a frequency polygon, each class frequency is plotted directly above the midpoint
of its class (for grouped data) and above the score (for ungrouped data). Lines are then drawn to connect
the points.
To close the polygon an additional class or score is added at both ends of the distribution and the
ends of the graph are brought down to the baseline at the midpoint or score of the additional classes.
Make a frequency polygon or line graph of the scores in math of 4th year students.
Scores: 25 40 31 25 40 27 38 35 25 26 29 32 31 34 39
Table 3. Scores of
Fourth Year Students in Math
Score Frequency
40 2
39 1
38 1
37 0
36 0
35 1
34 1
33 0
32 1
31 2
30 0
29 1
28 0
27 1
26 1
25 3
Total 15
2-
f
1-
0 / / / / / / / / / / / / / / / / / /
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Scores
A graphical frequency distribution data using rectangular graphs and constructed over the
lower and upper limits of an interval. Consists of a set of rectangles having bases on horizontal
axis which centers on the class marks (midpoints).
The base width corresponds to class size, the height of rectangle corresponds to class
frequencies or relative frequencies.
Example :
25 30 41 20 11 35 44 26 17 28
48 31 9 19 39 34 15 22 32 36
29 6 13 27 33 26 37 26 24 24
HISTOGRAM
F 7 -
R 6 -
E 5 -
Q 4 -
U 3 -
E 2 -
N 1 -
C 0 / / / / / / / / / / /
Y 2 7 12 17 22 27 32 37 42 47 52
Midpoints
Social Class f
Very High, VH 7 50 -
High, H 16 40 -
Above Average, AA 25 30 -
Below Average, BA 47 20 -
Low, L 40
10 -
Very Low, VL 27
0 - VL L BA AA H VH
35 51 41 46 40 46 40 44 32 38 44 44
45 48 37 40 43 45 45 40 38 46 50
45 41 30 34 55 47 48 50 43 53 42 N = 34
GDD, Panabo National High School 21
BAR GRAPH
Most common and widely used graphical device. Consists of bars or heavily lines of
equal widths either all vertical or all horizontal. X and Y ratio is 4:3. Length of bars represents
the magnitude of the qualities being compared. Space between bars is equal or not less than one-
half of its width. Used to represent data not arranged in a step distribution.
Activity: Make a bar graph of the Marital Status Distribution of Graduate Students drawn
horizontally. Use the space below.
Exercise: Make a Bar Graph for Ordinal Data. At least there are three categories in a
characteristic being presented. Gather data in your own section. Use the space below.
Enrolment F RF (%)
Elementary 300 25
High School 400 33
College 500 42
Total 1200 100
Elementary
25%
High School
33%
College
42%
Figure 3.2. Enrolment from elementary to college students of North Davao Colleges
SY 2011-2012
Exercise: Make your own example (at least three categories) of a pie chart or circle graph for
nominal data.
RANKING OF SCORES
Arranging of scores from highest to lowest, and assigning consecutive numbers 1 to
highest, 2 to the next highest, until the last score is covered. In case of tie scores, the average of
their initial ranks serves as the final rank.
34 26 36 24 27 20 26 26 30 41 35 15 28 29 39 19 33
11 32 24 17 13 41 9 37 44 22 25 48 6
0 1 2 3 4 5 6 7 8 9 Total
4 II I I 4
3 I I I I I I I I 8
2 I I II I III I I I 11
1 I I I I I 5
0 I I 2
Scores:46 38 44 22 26 27 25 29 29 25
20 40 36 38 30 45 32 44 44 41
GDD, Panabo National High School 27
A single figure which is representative of the general level of magnitudes or values of the
items in a set of data. This figure is used to represent all the numbers in the set of data. When
arranged according to magnitude, it tends to lie centrally within the set.
The scores or values that tend to locate themselves at the center of a rank-ordered or step
distribution.
2. Median, x͂ – the middle most score or value above which lie 50% of the distribution and
below which lie the remaining 50% of the distribution. The center most score in a distribution.
Where x = scores
Ʃx = sum of scores
Meaning, arrange the scores from highest to lowest or vice-versa. The 4th score is the median.
Scores:89 84 83 81 78 75 70
1st 2nd 3rd 4th 5th 6th 7th
The 4th score is 81.
Median, x̃ = 81
Data: 41 44 17 34 22 06 26 24 20 26 19 32 13 33 25
39 11 48 09 36 27 26 30 35 28 41 15 29 37 24
Please Solve.
GDD, Panabo National High School 30
Example: Compute the mean, median, and mode of the following scores or set of data.
25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
Formula:
• Mean, x̅ = A.M. + (ΣFd ÷ N) i
Data: 25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
R = HS-LS
R = 48 – 6 i = R/8
R = 42 i = 42 ÷ 8
i = 5.25
i = 5, approximately
Md, x͂ = 27.36
COMPUTATION OF THE MODE:
Mode, x̂ – the midpoint of the class or step interval having the highest frequency.
QUANTILE – is a natural extension of the median concept in that they are values which
divide a set of data into equal parts. While the median divides the distribution into two (2) equal
parts, the quantile divides the distribution into four (4), ten (10), and one hundred (100) equal
parts.
1. Quartile – the quantile which divides the distribution into four (4) parts or quarters.
2. Decile – the quantile which divides the distribution into ten (10) parts.
3. Percentile – the quantile which divides the distribution into one hundred (100) parts.
QUARTILES – are values that divide the distribution into four quarters.
Q1 – score or value below which lie one-fourth (1/4) or 25% of the cases.
Q2 – score or value below which lie one-half (1/2) or 50% of the cases also known as
median.
Q3 – score or value below which lie three-fourths (3/4) or 75% of the cases.
Q4 – score or value below which lie one (1) or 100% of the cases.
Data: 25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
H.S.=48 L.S.=6 R=42 i=5
Table:
Class Interval f <cf Solve Q1:
45-49 1 30
40-44 3 29 Q1 = L.L. + [(N/4 – cf) ÷ f ] i
35-39 4 26
30-34 4 22 Solve Q2:
25-29 7 18
Q2 = L.L. + [(N/2 – cf) ÷ f ] i
20-24 4 11 Q1
15-19 3 7 Solve Q3:
10-14 2 4
5-9 2 2 Q3 = L.L. + [(3/4N – cf) ÷ f ] i
N = 30
Where: L.L. = exact lower limit of class interval containing Q1, Q2, or Q3
cf = cumulative frequency of a class interval below the location of
Q1, Q2, or Q3
f = frequency of a class interval containing Q1, Q2, or Q3
For Q1:
Q1 = L.L. + [(N/4 – cf) ÷ f ] i
Q1 = 19.5 + [(30/4 – 7)÷4]5
Q1 = 19.5 + [(7.5 – 7)÷4]5
Q1 = 19.5 + (0.5÷4)5
Q1 = 19.5 + 0.625
Q1 = 20.125
PERCENTILES - are scores or values below which a certain percentage of the cases lie.
Example: Solve P60 for the following list of scores resulting from a statistics examination
administered to 50 students.
Scores: 48 94 89 91 85 98 68 79 91 62 98
53 54 76 66 62 62 99 96 59 72 64 55
82 68 78 77 93 70 49 57 79 88 61 90
73 70 59 51 92 43 69 46 52 100 73 61
93 83 58
60(50)
P60 = = 30th Compute P40: Compute P75:
10
0
P60 = 77
Class Interval
X f <cf %N - cf
100-104 1 50 P = L.L. + [ ]i
f
95-99 4 49
90-94 7 45 L.L. = exact lower limit of interval containing
85-89 3 38 the %N score from the lowest.
80-84 2 35 N = number of cases
75-79 5 33 cf = cumulative frequency of interval
70-74 5 f 28 %N below the location of %N
65-69 4 23 cf f = frequency of class interval containing
60-64 7 19 the %N
55-59 4 12 i = class size
50-54 4 8
45-49 3 4
40-44 1 1
N = 50
%N = 50%(N) = 50%(50) = 25
Therefore, %N = 25 is at class interval 70-74. The exact lower limit, L.L., is 69.5.
The cf = 23 and f = 5.
Scores: 48 94 89 91 85 98 68 79 91 62 98
53 54 76 66 62 62 99 96 59 72 64 55
82 68 78 77 93 70 49 57 79 88 61 90
73 70 59 51 92 43 69 46 52 100 73 61
93 83 58
43 58 68 78 91
46 59 68 79 92
48 60 69 79 93
49 61 70 82 93
51 61 70 83 94
52 62 72 85 96
53 62 73 88 98
54 62 73 89 98
55 64 76 90 99
57 66 77 30th 91 100
For Ungrouped Data: Arrange the scores. Determine the placement of the score from the
lowest.
Pr = (SP x 100%)/N Where: SP – score placement from the lowest
N – number of cases
What is the percentile rank of score 77? The Score placement of 77 is at the 30th.
Therefore:
Pr = (30 x 100%)/50
Pr = (3000%)/50
Pr = 60%
Class Interval
X f <cf Cf + [ (X – L.L.)/i ]
100-104 1 50 Pr = X 100%
N
95-99 4 49
90-94 7 45 Where:
85-89 3 38 L.L. = exact lower limit of class interval containing the
given score
80-84 2 35
f = frequency of class interval containing the score
75-79 5f 33 i = class size
70-74 5 28 cf N = number of cases
65-69 4 23 cf = cumulative frequency of class interval below the
60-64 7 19 location of the given score
55-59 4 12 X = given score whose percentile rank is desired
50-54 4 8
45-49 3 4
40-44 1 1
N = 50
Score 76.5 is at class interval 75-79. The exact lower limit (L.L.) is 74.5, the f is 5, the i=5
and cf is 28 which is below the location of the given score.
Cf + [ (X – L.L.)/i ] 28 + [(76.5 – 74.5)/5] 5
Pr = X 100% = X 100% = 60%
N 5
MEASURES OF VARIABILITY
Hence, the description of a set of data becomes more meaningful if the degree of
clustering about a central a central point is measured. Information on how far apart the
observations are far from each other in every set will be very useful.
VARIABILITY – is the degree of spread or dispersion of the scores from the mean.
Measures of Variability
1. Range – the difference between highest and lowest scores or highest class upper limit and
lowest class lower limit. It is a crude measure of variability because it is dependent on extreme
scores. It is considered unstable measure of variability.
Example for Ungrouped Data
45 - 49 40 - 44
5–9 15 – 19
Since the range of group A is greater than the range of group B, group A is considered a
heterogeneous group and group B is homogeneous group.
Q = (Q3 – Q1)/2
Q1 = ¼N = ¼ (6) = 1.5 th Q1 = 73
Q3 = ¾N = ¾ (6) = 4.5 th Q3 = 83.5
Group B: 93 90 88 80 76 69 N=6
Since the quartile deviation, Q, of group B is greater than the quartile deviation of group
A, group B is heterogeneous group and group A is homogeneous group.
x= individual score
x̅ = mean of the scores
∑|x - x|= sum of absolute deviations from the mean
Example: Compute the mean or average deviation, MAD, of the given scores.
Scores: 15 15 17 18 20 N=5
Score, x (x - x) |x - x|
15 -2 2
15 -2 2 MAD = ∑|x - x|/N
17 0 0
18 1 1 MAD = 8/5
20 3 3
∑x = 85 ∑|x - x|= 8 MAD = 5.12
x = 17
x = ∑fM/N
x = 386/18 = 21.44
Scores: 89 84 83 78 76 70 N=6
x x2
89 7921
∑x2 - (∑x)2/N
84 7056 SD =
N
83 6889
38628 – (480)2/6
78 6084 SD = 6
76 5776
SD = 6.14 population SD
70 4900
2
∑x = 480 ∑x =38626 SD = 6.72 sample SD (pls. compute)
(N-1)
Class fM2
f M fM
Interval
45-49 1 47 47 2209
12 5292 ∑fM2 – (∑fM)2/N
40-44 3 42 SD = N
6
14 5476 25363-(815)2/30
35-39 4 37 30
8
12 4096
30-34 4 32 SD = 10.37 population SD
8
18 5103
25-29 7 27
9
SD = 10.54 sample SD (pls. compute)
20-24 4 22 88 1936
(N-1)
15-19 3 17 51 867 SD =
10-14 2 12 24 288
5-9 2 7 14 96
N=30 ∑fM=815
∑fM2=25363
Class fd2
f d fd
Interval
45-49 1 4 4 16 Ʃ ∑fd2 – (∑fd)2/N
40-44 3 3 9 27 SD = i N
35-39 4 2 8 16
30-34 4 1 4 4
129 – (1)2/30
25-29 7 0 0 0
SD = 5 30
20-24 4 -1 -4 4
15-19 3 -2 -6 12 SD = 10.37 population SD
10-14 2 -3 -6 18
5-9 2 -4 -8 32 SD = 10.54 sample SD (pls. compute)
(N-1)
N=30 ∑fd=1 ∑fd2=129
EXERCISES:
1. Compute the standard deviation of the given data. (Ungrouped)
2. Compute the standard deviation of the given set of scores using the two methods. (Grouped)
GDD, Panabo National High School 47
Q where: Q3 – Q1
Kurtosis, Ku = Q =
P90 – P10 2
P10 = Percentile 10
P90 = Percentile 90
Example: Determine the shape of the frequency polygon according to height (kurtosis) of the
given data.
25 37 13 32 19 28 35 30 26 27 36 34 48
44 41 24 33 29 15 41 26 20 24 26 6 22
9 17 11 39 N=30
Construct a FDT:
%N – cf Q
P90 = L.L. + %N – cf Ku =
f 7 -i P10 = L.L. + i P90 – P10
f
6 -- 26
90%(30) 7.5
P90 = 39.5 + 5 10%(30) - 2 Ku =
3 P10 = 9.5 + 5 41.167 – 12
5 -
3
P90 = 39.5 + 1.67 P10 = 9.5 + 2.5 Ku = 7.5/29.167
4 -
P90 = 41.165 P10 = 12.00 Ku = 0.257
3 -
2 -
Since Ku = 0.257 < 0.263, therefore, the polygon or distribution graph
1 - is leptokurtic.
0 -
Graph
F
R
E
Q
U
E
N
C
Y
, , , , , , , , , , ,
2 7 12 17 22 27 32 37 42 47 52
Midpoint
Median Median
Frequency
Mean Mean
1. More high scores than 1. The high scores are 1. More low scores than
low scores. concentrated at the center high scores.
Negative Skew Normal Curve
of the distribution. Positive
2. Tail Skewright.
is towards
2. Tail is towards left. 3. Mode corresponds to a
2. Both sides are low value.
3. The mean will have lower symmetrically balanced. 4. Mean, which is sensitive
numerical value than the to each score value, will be
median because the 3. The median, mean and pulled in the direction of
extremely low scores will mode are all located at the extreme scores and will
pull the mean to the left. one point. have a high value.
5. Median, which is
4. Mode has a high unaffected by extreme
numerical value. value, will have a value
between the mode and the
mean.
A. The Mean is the score point on the X-axis which corresponds to the point of balance or
fulcrum of the distribution.
B. The median is the score point which bisects the total area. Half the area would fall to the left
and half to the right of an ordinate drawn at the median.
C. The mode is the score point with the greatest frequency, the point on the X-axis, which
corresponds to the tallest point of the curve.
3(Mean – Median)
SKEWNESS, SK =
SD
Please refer to the previous example. Compute the SKEWNESS of the given data.
25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
The computed mean = 27.17
The computed median = 27.36
The computed SD = 10.54
The Computed theoretical mode = 27.68
3(Mean – Median)
SK =
SD
3(27.17 – 27.36)
SK =
10.54
GRAPH:
F
r
e
q
u Mode
e
n M
c
y
Negatively Skewed
Mean
Median
Mode
1. The curve is symmetrical and bell-shaped. It has its highest point at the center. The lines at
sides fall off toward the opposite directions at exactly equal distances from the center.
Therefore, when the curve is folded at the middle, the two sides are perfectly of the same size
and shape.
2. The number of cases, N, is infinite. This is the reason why the curve is asymptotic to the base
line or axis, and that the curve may extend infinitely to both directions.
3. The three measures of central tendency coincide at one point at the center of the
distribution.
4. The height of the curve indicates the frequency of cases, expressed as probability, proportion
or percentage. Hence, the total area under the normal curve is 1.0 in terms of probability or
proportion and 100% in terms of percentage. Thus one-half of the area is 0.5 or 50%.
5. The basic unit of measurement is expressed in sigma unit (σ) or standard deviation (SD) along
the baseline. The sigma units are also called Z-scores (x/σ).
6. Two parameters are used to describe the curve. One is the parameter mean which is equal to
zero (µ=0) and the other is the standard deviation which is equal to 1 (σ=1).
7. Standard deviations or Z-scores departing away from the µ=0 towards the right of curve or
above the mean are expressed in positive values, while the values departing from the mean to
the left of the curve or below the mean are in negative values.
The area under the normal curve is directly related to the distance of the sigma (σ) unit
from the parameter mean (µ). The total area of the curve is 1.0 in terms of probability or
proportion, or 100% in terms of percentage.
Example: The total area located between the mean or parameter mean (µ=0) and +1.0σ or
1SD is .3413 of the total area or roughly 34.13% of the cases who scored
between the mean and 1standard deviation above the mean.
Since the curve is symmetrical, the portion between the mean (µ=0) and -1.0σ or -1SD
would also be .3413 in proportion or 34.13% in percentage of the cases.
Hence, between 1.0σ and -1.0σ or between 1SD and -1SD are included .6826 in
proportion of the total area or 68.26% in percentage of the cases represented by the area
under the normal curve.
Area below the curve
Area below the curve from µ=0 to 1σ or 1SD
from µ=0 to -1σ or -1SD A = 0.3413
A = 0.3413 A = 34.13 %
A = 34.13 %
Total area
0.6826 or
68.26 %
x–x
Z-score =
SD
Where:
Z-score - the sigma unit value
x - the score value
x - the mean
SD – standard deviation
Problem: In reading ability test with samples of 150 cases, the mean score is 40 and
the standard deviation is 4.0. Assuming normality of the distribution,
1. What percentage of the cases falls between the mean and a score of 46?
2. What is the probability that a score picked at random will lie above score 46?
3. What is the probability that a score will lie below score 46?
NOTES
IN
STATISTICS