Descriptive Statistics: Atistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

DESCRIPTIVE STATISTICS

Nature and Meaning of Statistics 1


Uses of Statistics 2
Fields of Statistics 2
Descriptive Statistics
Inferential Statistics
Population and Sample 3
The Variable 3
According to Functional Relationship
According to scale of Measurement 4
Subscript and Summation Notations 5
Methods in the Collection of Data 6
Presentation of Data 6
Frequency Distribution Table (FDT) 7
Steps in Constructing a Group FDT
Considerations in Tabular Presentation of Data 13
Graphical Presentation of Data
Frequency Polygon or Line Graph 16
Histogram 19
Bar Graph 22
Pie Chart or Circle Graph 24
Ranking of Scores 26
Measures of Central Tendency 28
Mean
Median
Mode
Other Measures of Location 35
Quartile
Percentile
Measures of Variability 42
Range
Quartile Deviation
Mean or Average Deviation
Standard Deviation, SD
Shapes of Frequency Polygon 48
Kurtosis
Skewness
Normal Curve 52
Areas Under the Normal Curve
The Conversion Formula
Applications
Table 55

GDD, Panabo National High School i


NATURE AND MEANING OF STATISTICS

Through the centuries man has been interested in determining the number of birth rate,
mortality rate, population, occurrence of certain events at a certain period of time, trade, crop
yield, figures on tax returns, and the frequency of failures in school entrance test, among many
others.

In these activities, it deals with the counts or numeral measures of activities, events and
things, which are called “statistics” in a limited sense.

Today the use of statistics has extended to such things as theatre attendance, basketball
results, car sales, heights, weights, and so many others that can be expressed numerically.

In counting activities, events and things, the measurement that are collected from the
original information are called raw data.

These data may be treated by statistical methods that are used to describe (descriptive
statistics), to relate or associate, and to make inferences (Inferential statistics).

STATISTICS

Statistics – is a science that deals with methods in the collection/gathering, tabulation or


presentation, analysis, and interpretation of numerical or quantitative data.

1. Collection/Gathering – refers to the process of obtaining numerical measurements. It


involves getting information through interviews, questionnaires, objective observations,
experimentations, tests and other methods translated into numerical or quantitative
data.

2. Tabulation or Presentation of Data – refers to the organization of data into tables,


graphs or charts, so that the logical and statistical conclusions can be derived from the
collected measurements.

3. Analysis of Data – pertains to the process of extracting from the given data relevant
information from which numerical description can be formulated. Resolution of
information into simpler elements by the application of statistical principles. This may
involve the use of any method of statistics the choice of which depends upon the nature
or purpose of the research problem.

4. Interpretation of Data – refers to the task of drawing conclusions from the analysed
data.

GDD, Panabo National High School 1


USES OF STATISTICS

1. It can give a precise description of data.

This is a function of statistics which enables us to make accurate statements or


judgments about average, variability and relationship. Example of this purpose is when you
describe the academic performance of a group of pupils according to the computed mean,
standard deviation, and correlation with another factor.

2. It can predict the behavior of individuals.

In school, grades of students can be predicted through a scholastic aptitude test. In


industry, the work performance is usually predicted by an aptitude test related to that
particular type of work. A teacher’s performance may also be predicted by his performance in
an instrument like teacher aptitude test. To measure the success of a pupil, worker, or a
teacher, we may have to compute measures like the mean, standard scores, percentiles,
standard deviation, and other statistical methods.

3. It can be used to test a hypothesis.

We can determine whether a variable is related or not to another variable through a


test of inference such as in correlation and other statistical measures.

It is wise to remember that your choice of the statistics to use in testing hypothesis
depends upon the nature of your data.

THE FIELDS OF STATISTICS

Descriptive Statistics

- is concerned with the gathering, classification, and presentation of data and the
collection of summarizing values to describe group characteristics of the data. The summarizing
values most commonly used in descriptive statistics are the measures of central tendency,
variability, and of skewness and kurtosis.

Inferential Statistics

- demands a higher order thinking of critical judgments and mathematical methods. It


aims to give information about large groups of data. Among the topics included in the study of
inferential statistics are the testing of hypothesis using t-test, z-test, simple linear correlation,
chi-square test, and the analysis of variance (ANOVA).

GDD, Panabo National High School 2


POPULATION AND SAMPLE

Population - refers to groups or aggregates of people, animals, objects, individuals,


materials, happenings or things of any form that can be described as having a unique
combination of qualities and having common characteristics.

Example: the population of teachers

the graduating students of a particular school

the employees of a company

the cars produced by a particular manufacturer

the depositors in a bank, etc.

Sample - the few members of the population to represent their characteristics or traits,
the representative part of the population, used to describe the population from which it was
taken.

The measures of population are called PARAMETERS.

The measures of sample are called ESTIMATES.

The VARIABLE

The term variable refers to the characteristic or property whereby the members of the
group or set vary or differ from one another. For instance, the members of a group may vary in
sex, age, colour, attitude, intelligence and others. Labels or numerals may be used to name a
variable.

VARIABLES ACCORDING TO FUNCTIONAL RELATIONSHIP

Variables are classified into independent and dependent with respect to their functional
relationship. For example: if you treat variable y as a function of variable x , then x is your
independent variable and y is your dependent variable. This means that the value of y (say
Academic Achievement) depends on the value of x (say Mental Ability).

GDD, Panabo National High School 3

VARIABLES ACCORDING TO SCALE OF MEASUREMENT

According to scale of measurement, we can categorize variable into Nominal, Ordinal,


Interval, and Ratio.
1. NOMINAL variable/measurement

1. The most limited type of variable.


2. It is merely used to differentiate classes or categories for purely classification or
identification purposes.

Example:

Sex (male and female), Marital Status (single, married, etc.), Religious Affiliation
(catholic, SDA, protestant, etc.). This groups formed can be identified by using numbers like 0 or
1. Since these numbers are merely used for identification purposes, no meaning can be
attached to the magnitude or size of such numbers.

Other examples of the use of nominal variable are assignments of numbers to basketball
players, houses, office rooms, and telephones, of which these numbers are just labels or codes
for categories.

Many statistical writers use the word qualitative to describe nominal variables which
cannot be ordered.

2. ORDINAL variable/measurement

Ordinal variable do not only classify but also order the classes. This is a property defined
by an operation whereby members of a particular group are ranked.

Examples of ordinal variables are the ranks 1, 2, 3, 4, 5 given by judges to the five
finalists in a beauty contest. Ordinal variable can be ordered but unable to determine the
degree of difference between any consecutive ordinal measures.

(other examples: very high (VH), high (H), above average (AA), below average (BA), low
(L), very low (VL).

3. INTERVAL variable/measurement

This refers to a property defined by an operation which permits making of statements of


equality of intervals rather than that of sameness or difference and greater than or less than. It
can differentiate between any two classes in terms of degrees of differences. Examples:
achievement scores, test scores, grades, IQ, temperature in degrees celcius. With interval
measurements, addition and subtraction have meanings. If two pupils got 80 and 75 in science
quiz, we can say that the sum of their scores is 155, while the difference is 5.

GDD, Panabo National High School 4

4. RATIO variable/measurement

This variable differ from interval variable only in one aspect. It has a true zero value
which indicates a total absence of the property being measured.
Example: 0°C is different from 0 K. Other examples are weight, length, age, and number
of children in a family. With ratio measurement, multiplication and division have meanings.

SUBSCRIPT AND SUMMATION NOTATIONS

Subscript - is a number or letter representing several numbers placed at the lower right
of the variable.

Example: Xi represents ages of five students

Xi X1, X2, X3, X4, X5

Where, i = 1, 2, 3, 4, and 5

Summation : ∑

Example:
n
ΣXi
i=1
Summation of Xi where i is 1 to n. This indicates that we get X1+X2+X3+. . . . Xn

The subscript i may be omitted together with n if no ambiguity will result. Thus we may
write ∑X to mean the sum of all X, ∑X = X 1+X2+X3+ . . . . .Xn or ∑XY means the sum of all the
products of XY, ∑XY = X1Y1 + X2Y2 + X3Y3 . . . . . XnYn

GDD, Panabo National High School 5

METHODS USED IN THE COLLECTION OF DATA

1. The Direct or Interview Method

This is a method of person-to-person exchange between the interviewer and the


interviewee. The interview method provides consistent and more precise information since
clarification may be given by the interviewee. However, this method is time consuming,
expensive and has limited field coverage.

2. The Indirect or Questionnaire Method

In this method written responses are given to prepared questions. A questionnaire is a


list of questions which are intended to elicit answers to the problems of a study. This method is
inexpensive and can cover a wide area in a shorter span of time.

3. The Observation Method

The investigator observes the behaviour of persons or organizations and their


outcomes. It is usually used when the subjects cannot talk or write.

4. The Registration Method

This method of gathering information is enforced by laws. Examples are registration of


births, deaths, motor vehicles, marriages and licenses. The advantage is that information is kept
systematized and made available to all because of the requirement of the law.

5. The experiment Method

Used when the objective is to determine the cause and effect relationship of certain
phenomena under controlled conditions.

Scientific researchers usually used the experiment method.

PRESENTATION OF DATA
Data collected must be organized in order to show significant characteristics. They can be
presented in three (3) forms.

1. Textual Presentation

Where data are presented in paragraph form. Example: As measured by the personal
consumption expenditures in the Gross National Product, consumers spent only P38.8 billion
this year (2008), up a meager 6.2 percent from P36.53 billion last year (2007). The previous year
(2006) personal consumption expenditures were likewise sluggish, growing by only 5.74
percent.

GDD, Panabo National High School 6

2. Tabular presentation

Tabular presentation is where data are presented in rows and columns. Tabulation is the
process of condensing classified data and arranging them in table. The arrangement of
the gathered data by categories plus their corresponding class frequencies and class
marks or midpoint is called Frequency Distribution Table (FDT).
FREQUENCY DISTRIBUTION TABLE (FDT)

A. FDT for NOMINAL DATA

The table should be headed by a number and a title to give the reader an idea of the
nature of the data being organized.

Table 1. Male and Female College Students of NDC


Majoring in Mathematics
Characteristic being
presented SEX FREQUENCY
Categories Male 35
Female 82
Total 117

B. FDT for ORDINAL DATA

The characteristic of variables are scaled or graded.

Table 2. PNHS Teachers Perception of the Leadership Behavior


of their Principal

Perception of Leadership Frequency


Behavior
Strongly Favorable 52
Favorable 37
Slightly Favorable 21
Unfavorable 15
Total 125

GDD, Panabo National High School 7

C. FDT for INTERVAL DATA : (Ungrouped)

Given the following scores of fourth year students in Math test, make a frequency
distribution table.

Raw Scores: 25 31 25 29 31 25 40 35 26 34 40 27 38 32 39

Arrange the scores from highest to lowest.


Table 3. Scores of
Fourth Year Students in Math

Score Frequency
40 2
39 1
38 1
37 0
36 0
35 1
34 1
33 0
32 1
31 2
30 0
29 1
28 0
27 1
26 1
25 3
Total 15

D. FDT for INTERVAL DATA (Grouped)

A grouped frequency distribution is an arrangement of data that shows the frequency of


occurrence of values falling within arbitrary defined ranges of the variable.

These ranges of the variables are known as “class intervals”. The data are condensed
into a fewer number of groups or categories. Class interval is a grouping or category defined by
a lower and upper limits.

GDD, Panabo National High School 8

Example: An arbitrary grouping such as 9-12 is called a class interval or simply a class. The end
numbers 9 and 12 are called class limits, lower and upper limits, respectively.

But the real or exact lower limit is 0.5 below the lower limit and the real or exact upper
limit is 0.5 above the upper limit.

In a class interval of 9-12, the exact lower limit is 8.5 and the exact upper limit is 12.5.

Example: (treating the data as grouped)

Construct a frequency distribution table (FDT) given the following data or scores.
47 56 42 28 56 65 53 82 42 62 57 38
62 52 66 39 55 47 54 55 54 48 52 47
44 56 37 42 50 62 48 56 42 60 41 72
48 78 68 68 N=40

Steps in Constructing A Grouped Frequency Distribution Table.


1. Compute the Range, R

Range = Highest Score – Lowest Score

R = 82 – 28 = 54

2. Find the Class Size, i

Determine the Class size/Interval size or the Class Interval size, (i) by dividing the range
by the desired number of classes. The normal or ideal number of class intervals ranges from 8
to 15. Therefore divide the range, R, in such a way that the resulting number of class intervals
fall from 8 to 15.

Class Interval Size, i = R ÷ 8 to 15

i = 54 ÷ 11

i = 4.9

i=5

GDD, Panabo National High School 9

3. Construct the Class Interval Column

Determine or organize the class limits of the class intervals. Tabulation is facilitated if
the lower class limits of the class intervals are multiples of the class size, and the bottom
interval must include the lowest score.

In our data, the lowest score is 28 and the interval size, i, is 5. Score 28 is not divisible by
5, therefore, the bottom class interval will start at 25 lower limit and 29 as upper limit. So, the
bottom class interval is 25-29, which contains the lowest score 28.

Class Interval
80-84
75-79
70-74
65-69
60-64
55-59 There are 12 categories (ideal number is 8-15)
50-54 The lower limits are divisible by the class size, i.
45-49
40-44
35-39
30-34
25-29

4. Tally the scores to the category of class interval it belongs. Count the tally column and
summarize under frequency, f, column and add. The sum is equal to number of cases, N.

Class Interval
Tally f
80-84
I 1
75-79
I 1
70-74
I 1
65-69
IIII 4
60-64
IIII 4
55-59
IIIII-II 7
50-54
IIIII-I 6
45-49
IIIII-I 6
40-44
IIIII-I 6
35-39
III 3
30-34
0
25-29
I 1
N=40

f
Tally f 1 GDD, Panabo National High School 10
Tally I 1 1 Midpoint >cf
Midpoint
5. Determine the midpoints
II by averaging
1 1 the
82lower limits and
<cf the upper
1 limits.
82
Class Interval II 1 4 77 40 2
77
80-84 I
IIII 4 4 72 39 3
72
75-79 IIII
IIII 4 7 67 38 7
67
70-74 IIII
IIIII-II 7 6 62 37 11
62
65-69 IIIII-II
IIIII-I 6 6 57 33 18
57
60-64 IIIII-I
IIIII-I 6 6 52 29 24
52
55-59 IIIII-I
IIIII-I 6 3 47 22 30
47
50-54 IIIII-I
III 3 0 42 16 36
42
45-49 III 0 1 37 10 39
37
40-44 I 1N=40 32 4 39
32
35-39 I N=40 27 1 40
27
30-34 1
25-29

6. Determine the cumulative frequency for “less than” (<cf) and “greater than” (>cf).
Class Interval
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29

GDD, Panabo National High School 11

7. Determine the Relative Frequency, RF(%) = (f ÷ N) x 100%.

Class Interval
Tally f Midpoint <cf >cf RF (%)
80-84
I 1 82 40 1 2.5
75-79
I 1 77 39 2 2.5
70-74
I 1 72 38 3 2.5
65-69
IIII 4 67 37 7 10
60-64
IIII 4 62 33 11 10
55-59
IIIII-II 7 57 29 18 17.5
50-54
IIIII-I 6 52 22 24 15
45-49
IIIII-I 6 47 16 30 15
40-44
IIIII-I 6 42 10 36 15
35-39
III 3 37 4 39 7.5
30-34
0 32 1 39 0
25-29
I f 1Midpoint 27 1
>cf 40(%)
RF 2.5
Tally N=40 100%
<cf
1 82 1 2.5
Final Table I 1 77 2 2.5
40
I 1 72 Table
39 5 3 2.5
I
Grouped Frequency, Cumulative
4 & Relative Frequency
67 Distribution
7 of10the Statistics Test Scores
38
IIII 4 62 of 40 M.A. 11 10
37 Students
IIII 7 57 18 17.5
33
IIIII-II 6 52 24 15
Class 29
IIIII-I 6 47 30 15
Interval 22
IIIII-I 6 42 36 15
80-84 16
IIIII-I 3 37 39 7.5
75-79 10
III 0 32 39 0
70-74 4
1 27 1 40 2.5
65-69 I N=40 1 100%
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29

GDD, Panabo National High School 12


CONSIDERATIONS IN TABULAR PRESENTATION OF DATA
Tables containing statistical data should be constructed with the following considerations:
1. Every table should be self-explanatory. The title should be short but should state clearly what
the table is all about.
2. Columns should be appropriately labeled and arrange logically.
3. Any necessary footnotes, such as the source of data, should be incorporated at the bottom of
the table.
4. Each table should be numbered for easy reference. The number table may be centered above
the table title or it may be given at the beginning of the title.
5. A table often starts with a single or double horizontal line. Another horizontal line separates
the column headings from the table body. Vertical lines may be used to separate the columns to facilitate
comprehension.

_____________________________________________________________________________
Exercise: Based on the raw data given, construct a grouped frequency distribution table.

Given are the distribution of statistics test scores of 40 students.

Given data: 30 43 33 51 67 50 30 48 48 39
65 37 42 24 53 80 61 49 46 28
47 49 52 42 40 50 42 47 61 68
70 38 20 56 75 18 38 29 58 59

N = 40

GDD, Panabo National High School 13


GDD, Panabo National High School 14

ASSIGNMENT:
1. Construct a 3x2 frequency distribution table for nominal data of LRC students,
SY 2020-2021. Make your own table title.
2. Construct a 4x2 frequency distribution table for nominal data of Panabo National
High School Teachers, SY 2020-2021. Make your own table title.

3. Construct two frequency distribution tables for ordinal data. Use sections Fauna and
Flora in the collection of your data. At least there are five categories in a
characteristic being presented. Make your own table title.

4. Construct a frequency distribution table for interval data (ungrouped). Provide your
own data variable. Make your own table title.

5. Construct a combined frequency distribution table for nominal data and interval data
(ungrouped). Make your own table title.

6. Construct a grouped frequency distribution table.

Statistics test scores of second year LRC students.


36 51 41 46 40 46 40 44 32 38 44

44 45 48 37 40 43 45 45 41 38 46
50 45 41 32 34 55 47 48 50 43 53
42
N = 34

GDD, Panabo National High School 15

3. Graphical presentation of data

Graph is a geometric or pictorial image of a set of data (picture or numerical data). The
data can be graphically presented according to their scale or level of measurements. Graphical
presentations are:

1. Frequency Polygon or line graph

2. Histogram or bar graph

3. Pie chart or circle graph

Advantages of graphical presentation

@ Graphs enable readers to easily grasp the essential facts that numerical data intend to convey.

@ Graphs can easily attract attention and are more readily understood. It can be bolstered by the
use of colors and pictorial diagrams.
@ Graphs simplify concepts that would otherwise have been expressed in so many words.

 FREQUENCY POLYGON OR LINE GRAPH

In constructing a frequency polygon, each class frequency is plotted directly above the midpoint
of its class (for grouped data) and above the score (for ungrouped data). Lines are then drawn to connect
the points.

To close the polygon an additional class or score is added at both ends of the distribution and the
ends of the graph are brought down to the baseline at the midpoint or score of the additional classes.

GDD, Panabo National High School 16

Example: For Interval / Ratio ungrouped data:

Make a frequency polygon or line graph of the scores in math of 4th year students.

Scores: 25 40 31 25 40 27 38 35 25 26 29 32 31 34 39

Construct a frequency distribution table:

Table 3. Scores of
Fourth Year Students in Math

Score Frequency
40 2
39 1
38 1
37 0
36 0
35 1
34 1
33 0
32 1
31 2
30 0
29 1
28 0
27 1
26 1
25 3
Total 15

GDD, Panabo National High School 17


3-

2-
f
1-

0 / / / / / / / / / / / / / / / / / /
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Scores

Figure 5. Frequency Polygon of Scores of Fourth Year Students in Mathematics

Example: For Interval / Ratio grouped data:

Make a frequency polygon of scores of students in statistics.


Scores: 25 30 41 20 11 35 44 26 17 28 48 31 9 19 39
34 15 22 32 36 29 6 13 27 33 26 37 26 24 24
Range = H.S. – L.S. = 48 – 6 = 42 i = R÷8 = 42÷8 = 5.25, i = 5
Construct a frequency distribution table:

Table 6. Scores of Students in Statistics

Class Interval Frequency Midpoint


45-49 1 47
40-44 2 42
35-39 4 37
30-34 5 32
25-29 7 27
20-24 4 22
15-19 3 17
10-14 2 12
5-9 2 7
N = 30

GDD, Panabo National High School 18


F
R 7 -
E 6 -
Q 5 -
U 4 -
E 3 -
N 2 -
C
1 -
Y
0 / / / / / / / / / / /
2 7 12 27 3217 37 22
42 47 52
Midpoints
Figure 7. Frequency Polygon Representing Scores of Students in Statistics
Histogram

A graphical frequency distribution data using rectangular graphs and constructed over the
lower and upper limits of an interval. Consists of a set of rectangles having bases on horizontal
axis which centers on the class marks (midpoints).

The base width corresponds to class size, the height of rectangle corresponds to class
frequencies or relative frequencies.

Example :

25 30 41 20 11 35 44 26 17 28

48 31 9 19 39 34 15 22 32 36

29 6 13 27 33 26 37 26 24 24

GDD, Panabo National High School 19

Construct a frequency distribution table:

Table 6. Scores of Students in Statistics

Class Interval Frequency Midpoint


45-49 1 47
40-44 2 42
35-39 4 37
30-34 5 32
25-29 7 27
20-24 4 22
15-19 3 17
10-14 2 12
5-9 2 7
N = 30

HISTOGRAM

F 7 -
R 6 -
E 5 -
Q 4 -
U 3 -
E 2 -
N 1 -
C 0 / / / / / / / / / / /
Y 2 7 12 17 22 27 32 37 42 47 52
Midpoints

Figure 7. Histogram of the Frequency Distribution of Statistics Test Scores of Students

GDD, Panabo National High School 20

Histogram for Ordinal Data

Example from a tabulated data.

Table 5.2 Social Classes Distribution


of LRC Students

Social Class f
Very High, VH 7 50 -
High, H 16 40 -
Above Average, AA 25 30 -
Below Average, BA 47 20 -
Low, L 40
10 -
Very Low, VL 27
0 - VL L BA AA H VH

Figure 3. Histogram of Social Classes Distribution


of LRC Students

Activity: Graph the given data using histogram.

35 51 41 46 40 46 40 44 32 38 44 44

45 48 37 40 43 45 45 40 38 46 50

45 41 30 34 55 47 48 50 43 53 42 N = 34
GDD, Panabo National High School 21

BAR GRAPH

Most common and widely used graphical device. Consists of bars or heavily lines of
equal widths either all vertical or all horizontal. X and Y ratio is 4:3. Length of bars represents
the magnitude of the qualities being compared. Space between bars is equal or not less than one-
half of its width. Used to represent data not arranged in a step distribution.

Example : Bar graph for Nominal Data

Table 1 .Marital Status Distribution


of Graduate Students 70 -
60 -
Marital Status Frequency
50 -
Single, S 54
40 -
Married, M 67 F 30 -
Widow, W 31 20 -
Legally 36 10 -
Separated, LS 0 -
S M W LS
Marital Status
Figure 1. Bar Graph of Marital Status Distribution
of Graduate Students

Activity: Make a bar graph of the Marital Status Distribution of Graduate Students drawn
horizontally. Use the space below.

GDD, Panabo National High School 22

Example: Bar Graph for Ordinal Data

Table 5.2 Social Classes Distribution


of LRC Students
50 -
Social Class f
40 -
Very High, VH 7
High, H 16 30 -
Above Average, AA 25 20 -
Below Average, BA 47 10 -
Low, L 40 0 -
Very Low, VL 27
VL L BA AA H VH
Figure 3. Bar Graph of Social Classes Distribution
of LRC Students

Exercise: Make a Bar Graph for Ordinal Data. At least there are three categories in a
characteristic being presented. Gather data in your own section. Use the space below.

GDD, Panabo National High School 23

PIE CHART OR CIRCLE GRAPH FOR NOMINAL DATA

Example from a tabulated data:

Table 3.2 Enrolment from elementary to college students of


North Davao Colleges SY 2009-2010

Enrolment F RF (%)
Elementary 300 25
High School 400 33
College 500 42
Total 1200 100

Elementary
25%
High School
33%

College
42%

Figure 3.2. Enrolment from elementary to college students of North Davao Colleges
SY 2011-2012

GDD, Panabo National High School 24

Exercise: Make your own example (at least three categories) of a pie chart or circle graph for
nominal data.

Study on how to present data using an Ogive graph.

GDD, Panabo National High School 25

RANKING OF SCORES
Arranging of scores from highest to lowest, and assigning consecutive numbers 1 to
highest, 2 to the next highest, until the last score is covered. In case of tie scores, the average of
their initial ranks serves as the final rank.

Example: Rank the following scores.

34 26 36 24 27 20 26 26 30 41 35 15 28 29 39 19 33
11 32 24 17 13 41 9 37 44 22 25 48 6

Prepare a Master Sheet : H.S. = 48 L.S. = 6

0 1 2 3 4 5 6 7 8 9 Total
4 II I I 4
3 I I I I I I I I 8
2 I I II I III I I I 11
1 I I I I I 5
0 I I 2

No. Score Rank No. Score Rank No. Score Rank


1. 48 1 11. 32 11 21. 24 20.5
2. 44 2 12. 30 12 22. 22 22
3. 41 3.5 13. 29 13 23. 20 23
4. 41 3.5 14. 28 14 24. 19 24
5. 39 5 15. 27 15 25. 17 25
6. 37 6 16. 26 17 26. 15 26
7. 36 7 17. 26 17 27. 13 27
8. 35 8 18. 26 17 28. 11 28
9. 34 9 19. 25 19 29. 9 29
10. 33 10 20. 24 20.5 30. 6 30

GDD, Panabo National High School 26


Activity: Rank the following scores.

Scores:46 38 44 22 26 27 25 29 29 25
20 40 36 38 30 45 32 44 44 41
GDD, Panabo National High School 27

MEASURES OF CENTRAL TENDENCY

A single figure which is representative of the general level of magnitudes or values of the
items in a set of data. This figure is used to represent all the numbers in the set of data. When
arranged according to magnitude, it tends to lie centrally within the set.
The scores or values that tend to locate themselves at the center of a rank-ordered or step
distribution.

1. Mean, x̅ - another term for arithmetic Average


- representative of a step distribution

2. Median, x͂ – the middle most score or value above which lie 50% of the distribution and
below which lie the remaining 50% of the distribution. The center most score in a distribution.

3. Mode – the most common or frequently occurring score in a distribution.

For UNGROUPED Data

Example: Scores 89 70 84 75 83 78 81 N=7

1. Find the Mean, x̅:


x̅ = Ʃx ÷ N
x̅ = (89+70+84+75+83+78+81) ÷ 7
x̅ = 560 ÷ 7
x̅ = 80

Where x = scores
Ʃx = sum of scores

GDD, Panabo National High School 28

2. Find the Median, x̃ :


x̃ = (N+1) ÷ 2 th
x̃ = (7+1 )÷ 2 th
x͂ = 8 ÷ 2 th
x͂ = 4th

Meaning, arrange the scores from highest to lowest or vice-versa. The 4th score is the median.
Scores:89 84 83 81 78 75 70
1st 2nd 3rd 4th 5th 6th 7th
The 4th score is 81.
Median, x̃ = 81

3. Find the Mode, x̂:


x̂ = the most common or frequent score in a distribution.

Example: A. 89 88 86 85 85 84 83 82 Mode = 85 unimodal


B. 75 76 76 77 78 79 79 80 Mode = 76&79 bimodal
C. 85 84 84 81 78 75 75 75 Mode = 75

If there is NO common or frequent score in the distribution, use theoretical mode.


Theoretical Mode, x̂t = 3(median) – 2(mean)

Example: 89 84 83 81 78 75 70 (Mean = 80) (Median = 81)


Theoretical Mode, x̂t = 3(81) – 2(80)
Theoretical Mode, x̂t = 243 – 160
Theoretical Mode, x̂t = 83
Example: Find the mode.
85 84 83 81 80 79 78 75

GDD, Panabo National High School 29


Exercise: Find the mean, median and mode of the given scores or data treating them as
ungrouped.

Data: 41 44 17 34 22 06 26 24 20 26 19 32 13 33 25
39 11 48 09 36 27 26 30 35 28 41 15 29 37 24

Please Solve.
GDD, Panabo National High School 30

MEASURES OF CENTRAL TENDENCY FOR GROUPED DATA:


Construct a frequency distribution table and make columns needed or required by the
formula.

Example: Compute the mean, median, and mode of the following scores or set of data.
25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30

COMPUTATION OF THE MEAN: BY MIDPOINT METHOD


Formula:
 Mean, x̅ = ΣFM/N where F is frequency, M is midpoint of class interval
R = HS-LS
R = 48 – 6
R = 42
i = 42 ÷ 8
i = 5.25
i = 5 , approximately

Construct a frequency distribution table (FDT) with columns F, M, & FM.


Class Interval Tally F M FM
45-49 I 1 47 47
45-49
40-44 III 31 42 126
40-44
35-39 IIII 43 37 148
35-39
30-34 IIII 4 32 128
30-34
25-29 IIIII II 74 27 189
25-29
20-24 IIII 47 22 88
20-24
15-19 III 34 17 51
15-19
10-14 II 23 12 24
10-14
5-9 II 2 7 14
5-9 N=30 2 ΣFM=815

GDD, Panabo National High School 31


Compute the mean by using the midpoint method formula.
 Mean, x̅ = ΣFM ÷ N
Mean, x̅ = 815 ÷ 30
Mean, x̅ = 27.17

COMPUTATION OF THE MEAN: BY DEVIATION METHOD

Formula:
• Mean, x̅ = A.M. + (ΣFd ÷ N) i

Data: 25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30

Construct a frequency distribution table and make columns needed or required


by the formula.

Construct a frequency distribution table with columns F, d, & Fd.


1. Locate the middle most class or step interval at N/2. Assume zero (0) deviation at this
interval and number all upper intervals with positive numbers 1, 2, 3, etc. and below
zero (0) intervals negative numbers; -1, -2, -3, etc.
2. Get the product of F & d and find the sum.
3. Assumed mean (A.M.) is the midpoint of the class interval at zero (0) deviation.

R = HS-LS
R = 48 – 6 i = R/8
R = 42 i = 42 ÷ 8
i = 5.25
i = 5, approximately

GDD, Panabo National High School 32

Construct a frequency distribution table (FDT) with columns F, d, & Fd.

Class Interval Tally F d Fd


45-49 I 1 4 4
40-44 III 3 3 9
35-39 IIII 4 2 8
30-34 IIII 4 1 4
25-29 IIIII II 7 0 0 N÷2
20-24 IIII 4 -1 -4
15-19 III 3 -2 -6
10-14 II 2 -3 -6
5-9 II 2 -4 -8
N = 30 ΣFd = 1

Compute the mean by using the deviation method formula.


 Mean, x̅ = A.M. + (ΣFd ÷ N) i
Mean, x̅ = 27 + (1 ÷ 30) 5
Mean, x̅ = 27.17
COMPUTATION OF THE MEDIAN:
Steps: 1. Locate the median interval at N/2 from the bottom or lowest class interval.
2. Compute the median

Formula: Median, x̂ = L.L. + N/2 - cfi


f
Where: L.L. = exact lower limit of class interval containing the median
cf = less than cumulative frequency (<cf) of the step interval below the
location of the median
f = frequency of the step interval containing the median
i = class size or interval size

GDD, Panabo National High School 33


Data: 25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
Construct a FDT:
Class Interval F <cf
45-49 1 30 Compute the Median, x͂
40-44 3 29
35-39 4 26 Md, x͂ = L.L. + [(N/2 – cf) ÷ f ] i
30-34 4 22
25-29 f 7 18 Md, x͂ = 24.5 + [(30/2 – 11)÷7] 5
20-24 4 11 <cf
15-19 3 7 Md, x͂ = 24.5 + [(15-11) ÷ 7] 5
10-14 2 4
5-9 2 2 Md, x͂ = 24.5 + (4 ÷ 7) 5

Md, x͂ = 27.36
COMPUTATION OF THE MODE:

Mode, x̂ – the midpoint of the class or step interval having the highest frequency.

Example: interval 25 – 29 has the highest frequency, f = 7,


So, Mo, x̂ = 27 this is a crude measure of Mode

Theoretical Mode, x̂ = 3(Median) – 2(Mean)


x̂ = 3(27.36) – 2(27.17)
x̂ = 27.68
GDD, Panabo National High School 34
OTHER MEASURES OF LOCATION:

QUANTILE – is a natural extension of the median concept in that they are values which
divide a set of data into equal parts. While the median divides the distribution into two (2) equal
parts, the quantile divides the distribution into four (4), ten (10), and one hundred (100) equal
parts.
1. Quartile – the quantile which divides the distribution into four (4) parts or quarters.
2. Decile – the quantile which divides the distribution into ten (10) parts.
3. Percentile – the quantile which divides the distribution into one hundred (100) parts.

QUARTILES – are values that divide the distribution into four quarters.
Q1 – score or value below which lie one-fourth (1/4) or 25% of the cases.
Q2 – score or value below which lie one-half (1/2) or 50% of the cases also known as
median.
Q3 – score or value below which lie three-fourths (3/4) or 75% of the cases.
Q4 – score or value below which lie one (1) or 100% of the cases.

QUARTILES FOR UNGROUPED DATA


Example: Given the scores, find the value of Q1, Q2, Q3, & Q4
97 77 80 81 89 77 83 74 73
62 76 87 77 73 68 81 85 94
71 92 78 67 65 63 56 N = 25

Construct a Master Sheet:


0 1 2 3 4 5 6 7 8 9
9 I I I
8 I II I I I I
7 I II I I III I
6 I I I I I
5 I

GDD, Panabo National High School 35

Arrange the scores: (for ungrouped data) Solving Q1:


97 76
94 74 Q1 = ¼ N th
92 73 Q1 = ¼ (25) th
89 73 Q1 = 6.25 th
87 Q1 71
85 68 Count from the bottom the 6.25th
83 67 score.
81 65
Q1 = 68.75
81 63
80 62.
78 56 The 6th score is 68. The 7th score is 71. To find
77 the 6.25th score, divide the difference of 71 & 68
77 by 4. It is equal to (71-68)÷4 = 0.75. Therefore,
77 every 0.25th counting gives a value of 0.75. So,
6.25th score is 68.75.

Solve Q2: Solve Q3: Solve Q4:


(same as Median)

GDD, Panabo National High School 36

QUARTILES FOR GROUPED DATA

Data: 25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
H.S.=48 L.S.=6 R=42 i=5
Table:
Class Interval f <cf Solve Q1:
45-49 1 30
40-44 3 29 Q1 = L.L. + [(N/4 – cf) ÷ f ] i
35-39 4 26
30-34 4 22 Solve Q2:
25-29 7 18
Q2 = L.L. + [(N/2 – cf) ÷ f ] i
20-24 4 11 Q1
15-19 3 7 Solve Q3:
10-14 2 4
5-9 2 2 Q3 = L.L. + [(3/4N – cf) ÷ f ] i
N = 30
Where: L.L. = exact lower limit of class interval containing Q1, Q2, or Q3
cf = cumulative frequency of a class interval below the location of
Q1, Q2, or Q3
f = frequency of a class interval containing Q1, Q2, or Q3

For Q1:
Q1 = L.L. + [(N/4 – cf) ÷ f ] i
Q1 = 19.5 + [(30/4 – 7)÷4]5
Q1 = 19.5 + [(7.5 – 7)÷4]5
Q1 = 19.5 + (0.5÷4)5
Q1 = 19.5 + 0.625
Q1 = 20.125

Solve for Q2: Solve for Q3: Solve for Q4:

GDD, Panabo National High School 37

PERCENTILES - are scores or values below which a certain percentage of the cases lie.

For ungrouped data:


If you are asked to find the P 50 of the set of scores having an N=300, simply multiply N by
50 (as P50) and divide by 100. The resulting value is still a counting number.
P50 = (50 x 300)/100 th
P50 = (15000)/100 th
P50 = 150 th
th
The P50 is the 150 number when arrange from lowest to highest. If score 100 is the
th
150 number, P50 is 100.

Example: Solve P60 for the following list of scores resulting from a statistics examination
administered to 50 students.

Scores: 48 94 89 91 85 98 68 79 91 62 98
53 54 76 66 62 62 99 96 59 72 64 55
82 68 78 77 93 70 49 57 79 88 61 90
73 70 59 51 92 43 69 46 52 100 73 61
93 83 58

Construct a master sheet to arrange the scores.


43 58 68 78 91
46 59 68 79 92
48 60 69 79 93
49 61 70 82 93
51 61 70 83 94
52 62 72 85 96
53 62 73 88 98
54 62 73 89 98
55 64 76 90 99
57 66 77 30th 91 100

60(50)
P60 = = 30th Compute P40: Compute P75:
10
0
P60 = 77

GDD, Panabo National High School 38

For Grouped Data

Class Interval
X f <cf %N - cf
100-104 1 50 P = L.L. + [ ]i
f
95-99 4 49
90-94 7 45 L.L. = exact lower limit of interval containing
85-89 3 38 the %N score from the lowest.
80-84 2 35 N = number of cases
75-79 5 33 cf = cumulative frequency of interval
70-74 5 f 28 %N below the location of %N
65-69 4 23 cf f = frequency of class interval containing
60-64 7 19 the %N
55-59 4 12 i = class size
50-54 4 8
45-49 3 4
40-44 1 1
N = 50

Example: Compute the P50:

%N = 50%(N) = 50%(50) = 25
Therefore, %N = 25 is at class interval 70-74. The exact lower limit, L.L., is 69.5.
The cf = 23 and f = 5.

P50 = P = L.L. + [ (%N-cf)/f ] i


P = 69.5 + [ (50%x50 – 23/5] 5
P = 69.5 + [(25 – 23)/5] 5
P = 69.5 + (2/5) 5
P = 69.5 + 2
P = 71.5

Compute the P10 : Compute the P90 :

GDD, Panabo National High School 39


PERCENTILE RANK, Pr – relative rank position of a score which indicates the percentage of cases
that lie below it.

Example: For Ungrouped Data


Compute the percentile rank, Pr, of score 77. (refer to previous example ungrouped data)

Scores: 48 94 89 91 85 98 68 79 91 62 98
53 54 76 66 62 62 99 96 59 72 64 55
82 68 78 77 93 70 49 57 79 88 61 90
73 70 59 51 92 43 69 46 52 100 73 61
93 83 58

Construct a master sheet to arrange the scores.

43 58 68 78 91
46 59 68 79 92
48 60 69 79 93
49 61 70 82 93
51 61 70 83 94
52 62 72 85 96
53 62 73 88 98
54 62 73 89 98
55 64 76 90 99
57 66 77 30th 91 100

For Ungrouped Data: Arrange the scores. Determine the placement of the score from the
lowest.
Pr = (SP x 100%)/N Where: SP – score placement from the lowest
N – number of cases

What is the percentile rank of score 77? The Score placement of 77 is at the 30th.
Therefore:
Pr = (30 x 100%)/50
Pr = (3000%)/50
Pr = 60%

Compute the percentile rank Compute the percentile rank


of score 53, Pr53 of score 88.5, Pr88.5

GDD, Panabo National High School 40

For Grouped Data:

Class Interval
X f <cf Cf + [ (X – L.L.)/i ]
100-104 1 50 Pr = X 100%
N
95-99 4 49
90-94 7 45 Where:
85-89 3 38 L.L. = exact lower limit of class interval containing the
given score
80-84 2 35
f = frequency of class interval containing the score
75-79 5f 33 i = class size
70-74 5 28 cf N = number of cases
65-69 4 23 cf = cumulative frequency of class interval below the
60-64 7 19 location of the given score
55-59 4 12 X = given score whose percentile rank is desired
50-54 4 8
45-49 3 4
40-44 1 1
N = 50

Example: Compute the percentile rank of score 76.5

Score 76.5 is at class interval 75-79. The exact lower limit (L.L.) is 74.5, the f is 5, the i=5
and cf is 28 which is below the location of the given score.
Cf + [ (X – L.L.)/i ] 28 + [(76.5 – 74.5)/5] 5
Pr = X 100% = X 100% = 60%
N 5

Therefore, the score 76.5 has a percentile rank of 60%.

Compute the Pr of score 50. Compute the Pr of score 92

Find the score of P25 . Find the score of P75 .

GDD, Panabo National High School 41

MEASURES OF VARIABILITY

Previously, we discussed the measures of central tendency as means to describe a given


set of data. These measures indicate the point where the items are centrally located. However,
they do not show whether the items in the distribution are far from or close to each other.

Example: Set A 15 15 17 18 29 Mean = 17


Set B 11 13 18 18 25 Mean = 17
Set C 14 15 16 19 21 Mean = 17
Set D 14 15 18 19 19 Mean = 17

Hence, the description of a set of data becomes more meaningful if the degree of
clustering about a central a central point is measured. Information on how far apart the
observations are far from each other in every set will be very useful.

VARIABILITY – is the degree of spread or dispersion of the scores from the mean.

Measures of Variability

1. Range – the difference between highest and lowest scores or highest class upper limit and
lowest class lower limit. It is a crude measure of variability because it is dependent on extreme
scores. It is considered unstable measure of variability.
Example for Ungrouped Data

Group A: 89 84 83 78 76 70 Range = H.S. – L.S. = 89 – 70 = 19


Homogeneous Group
Group B: 93 90 88 80 76 69 Range = H.S – L.S. = 93 – 69 = 24
Heterogeneous Group

Example for Grouped Data

Group A: Class Interval Group B: Class Interval

45 - 49 40 - 44

5–9 15 – 19

Range = 49.5 – 4.5 Range = 44.5 – 14.5


Range = 45 (heterogeneous) Range = 30 (homogeneous)

Since the range of group A is greater than the range of group B, group A is considered a
heterogeneous group and group B is homogeneous group.

GDD, Panabo National High School 42


2. Quartile Deviation (semi-interquartile range) – the amount of spread between the first
quartile (Q1) and the median, or the median and third quartile (Q3).

Q = (Q3 – Q1)/2

Example: Group A: 89 84 83 78 76 70 N=6

Q1 = ¼N = ¼ (6) = 1.5 th Q1 = 73
Q3 = ¾N = ¾ (6) = 4.5 th Q3 = 83.5

Q = (Q3 – Q1)/2 = (83.5 – 73)/2 = 10.5/2 = 5.25 (homogeneous)

Group B: 93 90 88 80 76 69 N=6

Q1 = ¼N = ¼(6) = 1.5 th Q1 = 72.5


Q2 = ¼N = ¾(6) = 4.5 th Q1 = 89

Q = (Q3 – Q1)/2 = (89 – 72.5)/2 = 16.5/2 = 8.25 (heterogeneous)

Since the quartile deviation, Q, of group B is greater than the quartile deviation of group
A, group B is heterogeneous group and group A is homogeneous group.

3. Mean or Average Deviation, MAD:


For Ungrouped Data: MAD = ∑|x - x|/N

x= individual score
x̅ = mean of the scores
∑|x - x|= sum of absolute deviations from the mean

Example: Compute the mean or average deviation, MAD, of the given scores.

Scores: 15 15 17 18 20 N=5

Score, x (x - x) |x - x|
15 -2 2
15 -2 2 MAD = ∑|x - x|/N
17 0 0
18 1 1 MAD = 8/5
20 3 3
∑x = 85 ∑|x - x|= 8 MAD = 5.12
x = 17

GDD, Panabo National High School 43

For Grouped Data: MAD = ∑f|M - x|/N

Class Interval |M - x| f |M - x| MAD = ∑f|M-x|/N


f M fM
or scores, X
30-34 2 32 64 10.56 21.12 MAD = 92.24/18
25-29 4 27 108 5.56 22.24
20-24 5 22 110 0.56 2.80 MAD = 5.12
15-19 4 17 68 4.44 17.76
10-14 3 12 36 9.44 28.32
N=18 ∑fM=386 ∑f|M-x|

x = ∑fM/N

x = 386/18 = 21.44

4. Standard Deviation, SD – standard distance of the scores from the mean.


It is widely employed in any statistical operations.

For Ungrouped Data:


SD = ∑d2/(N-1) for sample data

SD = ∑d2/N for population data


where: d = x - x , the difference between score and mean.
d2= (x - x)2, the square deviation from the mean.

Example: Compute the standard deviation, SD, of the given scores.

Scores: 89 84 83 78 76 70 N=6

X d(x - x) d2(x - x)2


89 9 81 SD = ∑d2/N = 226/6 = 6.14 population SD
84 4 16
83 3 9 SD = ∑d2/(N-1) = 226/(6-1) = 6.72 sample
78 -2 4 SD
76 -4 16
70 -10 100
∑x = 480 ∑d2=226
x = 80

GDD, Panabo National High School 44

For Ungrouped Data:


∑x2 – (∑x)2/N ∑x2 – (∑x)2/N
SD = N SD = N-1
Population SD Sample SD

x x2
89 7921
∑x2 - (∑x)2/N
84 7056 SD =
N
83 6889
38628 – (480)2/6
78 6084 SD = 6
76 5776
SD = 6.14 population SD
70 4900
2
∑x = 480 ∑x =38626 SD = 6.72 sample SD (pls. compute)
(N-1)

For Grouped Data : (1. MIDPOINT METHOD)

∑fM2 – (∑fM)2/N for population SD


SD =
N
(N-1) for sample SD

Example: Compute the standard deviation, SD, of the given scores.


Scores: 25 37 13 32 19 28 35 30 26 27
36 34 48 11 41 24 33 29 15
41 26 20 24 26 6 22 9 17
44 39
N=30

Class fM2
f M fM
Interval
45-49 1 47 47 2209
12 5292 ∑fM2 – (∑fM)2/N
40-44 3 42 SD = N
6
14 5476 25363-(815)2/30
35-39 4 37 30
8
12 4096
30-34 4 32 SD = 10.37 population SD
8
18 5103
25-29 7 27
9
SD = 10.54 sample SD (pls. compute)
20-24 4 22 88 1936
(N-1)
15-19 3 17 51 867 SD =
10-14 2 12 24 288
5-9 2 7 14 96
N=30 ∑fM=815
∑fM2=25363

GDD, Panabo National High School 45


For Grouped Data: (2. DEVIATION METHOD)

∑fd2 – (∑fd)2/N for population SD


SD = i N
(N-1) for sample SD

Class fd2
f d fd
Interval
45-49 1 4 4 16 Ʃ ∑fd2 – (∑fd)2/N
40-44 3 3 9 27 SD = i N
35-39 4 2 8 16
30-34 4 1 4 4
129 – (1)2/30
25-29 7 0 0 0
SD = 5 30
20-24 4 -1 -4 4
15-19 3 -2 -6 12 SD = 10.37 population SD
10-14 2 -3 -6 18
5-9 2 -4 -8 32 SD = 10.54 sample SD (pls. compute)
(N-1)
N=30 ∑fd=1 ∑fd2=129
EXERCISES:
1. Compute the standard deviation of the given data. (Ungrouped)

GDD, Panabo National High School 46

2. Compute the standard deviation of the given set of scores using the two methods. (Grouped)
GDD, Panabo National High School 47

SHAPES OF FREQUENCY POLYGON

A. According to height, KURTOSIS – the peakedness or flatness of the polygon.

Leptokurtic Mesokurtic (Normal Curve) Platykurtic

If Kurtosis, Ku = 0.263, the graph or the polygon is mesokurtic (normal curve).


If Ku > 0.263, the polygon graph is platykurtic.
If Ku < 0.263, the polygon graph is leptokurtic.

Q where: Q3 – Q1
Kurtosis, Ku = Q =
P90 – P10 2

P10 = Percentile 10
P90 = Percentile 90

Example: Determine the shape of the frequency polygon according to height (kurtosis) of the
given data.

25 37 13 32 19 28 35 30 26 27 36 34 48
44 41 24 33 29 15 41 26 20 24 26 6 22
9 17 11 39 N=30

Construct a FDT:

Class Interval f <cf


45-49 1 30 N=
30 40-44 3 29 P90
35-39 4 26 Q3
30-34 4 22
25-29 7 18
GDD, Panabo National High School 48
20-24 4 11 Q1
15-19 3 7
Solve Q3: Solve Q1: Solve Q:
10-14 2 4 P10
5-9 ¼N
2 – cf
2
Q3 = L.L. + ¾N – cf i Q1 = L.L. + i Q3 – Q 1
f Q=
f 2

¾(30) - 22 ¼(30) – 7 35.125 – 20.125


Q3 = 34.5 + 5 Q1 = 19.5 + 5 Q=
4 4 2

Q3 = 34.5 + 0.625 Q1 = 19.5 + 0.625 Q = 15/2

Q3 = 35.125 Q1 = 20.125 Q = 7.5

Solve P90: Solve P10: Solve Ku:

%N – cf Q
P90 = L.L. + %N – cf Ku =
f 7 -i P10 = L.L. + i P90 – P10
f
6 -- 26
90%(30) 7.5
P90 = 39.5 + 5 10%(30) - 2 Ku =
3 P10 = 9.5 + 5 41.167 – 12
5 -
3
P90 = 39.5 + 1.67 P10 = 9.5 + 2.5 Ku = 7.5/29.167
4 -
P90 = 41.165 P10 = 12.00 Ku = 0.257
3 -

2 -
Since Ku = 0.257 < 0.263, therefore, the polygon or distribution graph
1 - is leptokurtic.

0 -
Graph

F
R
E
Q
U
E
N
C
Y
, , , , , , , , , , ,
2 7 12 17 22 27 32 37 42 47 52
Midpoint

GDD, Panabo National High School 49


SHAPES OF FREQUENCY POLYGON

B. According to symmetry of the sides, SKEWNESS - The tendency of a distribution to depart


from symmetry or balance. A measure of the deviation from symmetry of a distribution, or the
tendency of values to be more spread out on one side of the mean than the other.

A parameter describing an asymmetrical probability distribution curve. If the curve


skews to the right, it is a positive distribution, but if the curve skews to the left, it is negative.

(a) Negatively (b) Normal (no (c) Positively skewed


skewed skew)
Mean, Median, Mode
Mode Mode

Median Median
Frequency
Mean Mean

1. More high scores than 1. The high scores are 1. More low scores than
low scores. concentrated at the center high scores.
Negative Skew Normal Curve
of the distribution. Positive
2. Tail Skewright.
is towards
2. Tail is towards left. 3. Mode corresponds to a
2. Both sides are low value.
3. The mean will have lower symmetrically balanced. 4. Mean, which is sensitive
numerical value than the to each score value, will be
median because the 3. The median, mean and pulled in the direction of
extremely low scores will mode are all located at the extreme scores and will
pull the mean to the left. one point. have a high value.
5. Median, which is
4. Mode has a high unaffected by extreme
numerical value. value, will have a value
between the mode and the
mean.
A. The Mean is the score point on the X-axis which corresponds to the point of balance or
fulcrum of the distribution.

B. The median is the score point which bisects the total area. Half the area would fall to the left
and half to the right of an ordinate drawn at the median.

C. The mode is the score point with the greatest frequency, the point on the X-axis, which
corresponds to the tallest point of the curve.

GDD, Panabo National High School 50

3(Mean – Median)
SKEWNESS, SK =
SD

If SK = 0, the polygon or the distribution is normal.


If SK > 0, skew is positive.
If SK < 0, skew is negative.

Please refer to the previous example. Compute the SKEWNESS of the given data.

25 37 13 32 19 28 35 30 26 27 36 34 48 44 41
24 33 29 15 41 26 20 24 26 6 22 9 17 11 37
N = 30
The computed mean = 27.17
The computed median = 27.36
The computed SD = 10.54
The Computed theoretical mode = 27.68

3(Mean – Median)
SK =
SD

3(27.17 – 27.36)
SK =
10.54

SK = -0.0455 < 0 Negative skew


Skewed to the left or negatively skewed
Tail is towards left.

GRAPH:
F
r
e
q
u Mode
e
n M
c
y

Negatively Skewed

GDD, Panabo National High School 51

THE NORMAL CURVE

Mean

Median

Mode

-3σ -2σ -1σ µ=0 1σ 2σ 3σ


-3SD -2SD -1SD 1SD 2SD 3SD

1. The curve is symmetrical and bell-shaped. It has its highest point at the center. The lines at
sides fall off toward the opposite directions at exactly equal distances from the center.
Therefore, when the curve is folded at the middle, the two sides are perfectly of the same size
and shape.

2. The number of cases, N, is infinite. This is the reason why the curve is asymptotic to the base
line or axis, and that the curve may extend infinitely to both directions.

3. The three measures of central tendency coincide at one point at the center of the
distribution.
4. The height of the curve indicates the frequency of cases, expressed as probability, proportion
or percentage. Hence, the total area under the normal curve is 1.0 in terms of probability or
proportion and 100% in terms of percentage. Thus one-half of the area is 0.5 or 50%.

5. The basic unit of measurement is expressed in sigma unit (σ) or standard deviation (SD) along
the baseline. The sigma units are also called Z-scores (x/σ).

6. Two parameters are used to describe the curve. One is the parameter mean which is equal to
zero (µ=0) and the other is the standard deviation which is equal to 1 (σ=1).
7. Standard deviations or Z-scores departing away from the µ=0 towards the right of curve or
above the mean are expressed in positive values, while the values departing from the mean to
the left of the curve or below the mean are in negative values.

GDD, Panabo National High School 52

AREAS UNDER THE NORMAL CURVE

The area under the normal curve is directly related to the distance of the sigma (σ) unit
from the parameter mean (µ). The total area of the curve is 1.0 in terms of probability or
proportion, or 100% in terms of percentage.

Example: The total area located between the mean or parameter mean (µ=0) and +1.0σ or
1SD is .3413 of the total area or roughly 34.13% of the cases who scored
between the mean and 1standard deviation above the mean.

Area below the curve


from µ=0 to 1σ or 1SD.
A = 0.3413
A = 34.13 %

-3σ -2σ -1σ µ=0 1σ 2σ 3σ


-3SD -2SD -1SD 1SD 2SD 3SD

Since the curve is symmetrical, the portion between the mean (µ=0) and -1.0σ or -1SD
would also be .3413 in proportion or 34.13% in percentage of the cases.
Hence, between 1.0σ and -1.0σ or between 1SD and -1SD are included .6826 in
proportion of the total area or 68.26% in percentage of the cases represented by the area
under the normal curve.
Area below the curve
Area below the curve from µ=0 to 1σ or 1SD
from µ=0 to -1σ or -1SD A = 0.3413
A = 0.3413 A = 34.13 %
A = 34.13 %

Total area
0.6826 or
68.26 %

-3σ -2σ -1σ µ=0 1σ 2σ 3σ


-3SD -2SD -1SD 1SD 2SD 3SD

GDD, Panabo National High School 53

THE CONVERSION FORMULA

x–x
Z-score =
SD

Where:
Z-score - the sigma unit value
x - the score value
x - the mean
SD – standard deviation

Exercises: Use the table of “AREAS UNDER THE NORMAL CURVE”.

Find the proportion of area under the normal curve.

1. Between the mean and Zscore or σ = 1.37


2. Between the mean and Zscore = -1.18
3. To the right of Z = 1.61
4. To the left of Z = -2.05
5. Between Z = -1.02 and Z = 1.96

APLLICATIONS OF THE NORMAL PROBABILITY CURVE

Problem: In reading ability test with samples of 150 cases, the mean score is 40 and
the standard deviation is 4.0. Assuming normality of the distribution,
1. What percentage of the cases falls between the mean and a score of 46?

2. What is the probability that a score picked at random will lie above score 46?

3. What is the probability that a score will lie below score 46?

4. How many cases fall between scores 42 and 48?

GDD, Panabo National High School 54

TABLE OF AREAS UNDER THE NORMAL CURVE


GDD, Panabo National High School 55

NOTES

IN

STATISTICS

(For STE Program Use Only)

No portion of this “NOTES in STATISTICS” may be copied or reproduced


in any form for distribution or sale without written permission of the
author/compiler.

Prepared and Compiled


2015

GERSON D. DUMPASAN, C.E.


STE Program Administrator, Des.
PANABO NATIONAL HIGH SCHOOL

You might also like