0% found this document useful (0 votes)
53 views36 pages

Adv Stats Lessons

This document provides an overview of statistics, including defining key terms, describing the branches of statistics, and differentiating between descriptive and inferential statistics. Specifically, it defines statistics as dealing with collecting, organizing, analyzing, and interpreting data. It also defines key statistical concepts like population, sample, quantitative vs. qualitative data, discrete vs. continuous data, and levels of measurement. Additionally, it describes descriptive statistics as describing basic features of data through charts and summaries, while inferential statistics makes deductions about populations from samples through probability, hypothesis testing, and predictions. An example is provided to illustrate the difference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views36 pages

Adv Stats Lessons

This document provides an overview of statistics, including defining key terms, describing the branches of statistics, and differentiating between descriptive and inferential statistics. Specifically, it defines statistics as dealing with collecting, organizing, analyzing, and interpreting data. It also defines key statistical concepts like population, sample, quantitative vs. qualitative data, discrete vs. continuous data, and levels of measurement. Additionally, it describes descriptive statistics as describing basic features of data through charts and summaries, while inferential statistics makes deductions about populations from samples through probability, hypothesis testing, and predictions. An example is provided to illustrate the difference.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

L

1
S
STATISTICS S
O
I N
After going through this module, you are expected to define statistics and other related
terms and explains the important of statistics, defined the descriptive statistics and inferential
statistics and differentiate the differences between descriptive statistics and inferential statistics,
discuss the history of statistics and determine the developments made in statistics, identify the
importance of statistics in different fields, identify the real world problems that involves
statistics and discuss the importance of statistics in our daily life.

Let’s Warm Up
Answer the following questions in your notebook:
1. What it is Mathematics for you?
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________

2. What is Statistics for you?


_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________

3. Do you think Statistics can be use in our life? Why or why not?
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
D
Let’s Learn
Statistics is a branch of Mathematics that deals with the collection, organization, presentation,
analysis, and interpretation of data.
Statistics involves much more than simply drawing graphs and computing averages.
- In education, it is frequently used to describe test results.
- In science, the data resulting from experiments must be collected and analyzed. Diseases
are controlled through analysis designed to anticipate epidemics. The lifetime of a
battery can be tested in a laboratory.
- Manufacturers can provide better product at reasonable costs through the use of
statistical quality control techniques.
- In government, many kinds of statistical data are collected all the time.
- A knowledge of statistics can help you become more critical in your analysis of
information, hence you will not be misled by the manufactured polls, graphs and
averages.
A population is a complete collection of all elements (scores, people,…) to be studied.
A census is a collection of data from every element in a population.
A sample is a sub collection of elements drawn from a population.
Example: A Mathematics teacher plans to choose five students from the math club to be in
publicity photo. How could the teacher choose the four students?
Solution:
1. The Math teacher could put the names of all the students in a box, mix the names
without looking.
2. The Math teacher could choose the five students in the fourth row.
3. The teacher could mix the names of the boys and choose two from the group. The
teacher does the same for the girls.
4. The Math teacher could choose a group of four students in the corner of the last row.

The nature of data


Some data sets consist of numbers (such as weights), and others are non-numerical (such as
color). The terms quantitative data and qualitative data are often distinguished between these
two types.
Quantitative Data consist of numbers representing counts or measurements.
Qualitative Data can be separated into different categories that are distinguished by some
nonnumeric characteristics.
The following are examples of qualitative variables: gender, major classification, political party
affiliation, religious preference, marital status
Example: Classify the following as either quantitative or qualitative
a. Opinion on a political issue
b. Number of hospitals that have a nuclear center.
Solution:
a. Opinion is not a form of measurement but rather a classification, such as for or against;
therefore, it is qualitative.
b. The number of hospitals that have a nuclear center is a count variable and thus,
quantitative.
Discrete data result from either a finite number of possible values or countable possible values
as 0, or 1, or 2, and so on.
Continuous data result from infinitely many possible values that can be associated with points
on a continuous scale in such a way that there are no gaps or interruptions.
Example: Classify the following as qualitative or quantitative. If a variable is quantitative
(numerical), further classify it as discrete or continuous.
a. age congressmen
b. number of students in the auditorium
c. faculty rank
Solution:
a. quantitative – continuous
b. quantitative – discrete
c. qualitative – discrete
The nominal level of measurement is characterized by data that consist of names, labels, or
categories only.
The ordinal level of measurement involves data that may be arranged in some order but
differences between data values either cannot be determined or are meaningless.
The interval level of measurement is like the ordinal level. But meaningful amounts of
differences between data can be determined. It has no inherent (natural) zero starting point
(where none of the quantity is present.)
The ration level of measurement is the interval level modified to include the inherent zero
starting point (where zero indicates that none of the quantity is present.)
Example: Determine which of the four level of measurements (nominal, ordinal, interval and
ratio) is used.
a. Average annual temperature in Tagaytay
b. Weighs of garbage discarded by households
c. A judge rates some presentations as “good”
d. The political party to which each governor belongs.
Solution:
a. Interval c. Ordinal
b. Ratio d. Nominal

BRANCHES OF STATISTICS
Descriptive Statistics
Descriptive Statistics is that part of statistics which quantitatively describes the characteristics
of a particular dataset under study, with the help of brief summary about the sample.

Inferential Statistics
Inferential Statistics is one of the type of statistics in which a random sample is drawn from the
large population, to make deduction about the whole population, from which the sample is
taken.

Difference between Descriptive and Inferential Statistics


Descriptive Statistics Inferential Statistics
 It describes the basic features of the  It explains the chances of occurrence
situation. of an event or activity.
 Charts, graphs, and tables.  Probability Scores
 Date set is small  Large dataset

Descriptive Statistics  Inferential Statistics


 Measures of Central Tendency,  It explains the chances of occurrence
Measures of Dispersion. of an event or activity.
 Collection, organization,  Drawing conclusions, making
classification, summarization, comparisons, performing estimation,
analysis, interpretation and determining cause and effect
presentation of data. relationship, hypothesis testing and
making predictions.
Example 1: A Covid-19 patients in an unknown quarantine facility, 83% who had a female
nurse felt their nurse cared for their emotional well-being during their procedure, and 61% of
Covid-19 patients who had a male nurse felt their nurse cared for their emotional well-being
during their quarantine time.
Descriptive Statistics Inferential Statistics

90%  A female nurse is more likely to


80% care for your emotional well-being.
70%
60%
50%
40%
30%
20%
10%
0%
Female 1 Male
L
E

2
S
SLOVIN’S FORMULA AND SAMPLING TECHNIQUES S
O
I N
After going through this module, you are expected to calculate using Slovin’s formula,
determine the use of Slovin’s formula to calculate the sample size (n) given the population on
size (N) and a margin of error (e) and identify and describe the different sampling techniques,
define sampling, identify which sampling technique should be used in a condition.

Let’s Warm Up
Write the answer in your notebook. For you give five reasons for the use of samples.
1. A sample saves time compared to doing a complete census which requires more time.
2.___________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
3.___________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
4.___________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
5.___________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________

D
Let’s Learn
Slovin’s Formula
It is used to calculate an appropriate sample size from a population.
N
n
1  Ne 2 where
n = Number of sample
N = Total population
e = margin of error or tolerance (5%)
Margin of error is a value which quantities possible sampling error
Example 1: A researcher wants to study the academic performance in Mathematics of students
in a certain school. The school has a population of 12,000 students. If the researcher allows a
margin of error of 5%, how many students must he include in his sample?
N
n
1  Ne 2
12, 000

1  12, 000  0.05 
2

12, 000

1  12, 000  0.0025 
12, 000

1  30
12, 000

31
 387.096 or 387

Thus, the researcher must take 387 students as his sample.


Example 2: The student’s population of June’s College is 2436. Compute the number of sample
using Slovin’s formula. Use e  5% .
Solution:
n
N
1  ne 2
265

1  265  0.05 
2

265

1  0.6625
265

0.3375
 785.185
N  786
Thus, the researcher must take 786 students as his sample.

TYPES OF RANDOM SAMPLING


Simple/Lottery Random Sampling is a process where every member of the population has an
equal chance of being selected.
Example: When members of population have their names represented by small pieces of paper
which are then mixed together and picked out at random. The selected members will be included
in the sample.
L
E

4
S
ORGANIZING DATA S
O
I N
After going through this module, you are expected to organize data in frequency
distribution table and know the importance of frequency distribution and interpret and draw
conclusions from graphic and tabular presentation of data, and discuss the importance of
analyzing the data.

Analyze the Pie Chart below.

DRL'S FARM

Pigs
17%

Cows Chickens
12% 43%

Goats
16%
Dogs Cats
4% 8%

_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________

A frequency distribution for qualitative data lists all categories and the number of elements
that belong to each of the categories. It is used to organize nominal-level or ordinal level type of
data such as sex, business type, and civil status.
Constructing Frequency Distribution
A grouped frequency distribution is used when the range of the data set is larger; the data
must be grouped into classes whether it is categorical data or interval data. For interval data the
classes is more than one until in width.
A. Categorical Frequency Distribution
The categorical frequency distribution is used to organized nominal-level or ordinal-level type
of data. Some examples where we can apply this distribution are gender, business type, political
affiliation, and others.
Example: Twenty applicants were given a performance evaluation appraisal. The data set is
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Construct a frequency distribution for the data.
Solution:
Step 1: Construct a table as shown below.
Class Tally Frequency Percent
High
Average
Low
Step 2: Tally the raw data
Class Tally Frequency Percent
High IIII-II
Average IIII-III
Low IIII
Step 3: Convert the tailed data into numerical frequencies.
Class Tally Frequency Percent
High IIII-II 7
Average IIII-III 8
Low IIII 5
Step 4: Determine the percentage. The percentage is computed using the formula:
f
%  100% , where f = frequency of the class and n = total number of values.
n
Class Tally Frequency Percent Found by
High IIII-II 7 35 (7÷20)×100
Average IIII-III 8 40 (8÷20)×100
Low IIII 5 25 (5÷20)×100
Total 20 100
B. Determining Class Interval
Generally the number of classes for a frequency distribution table varies from 5 to 20,
depending primarily on the number of observations in the data set. It is preferably to have more
classes as the size of a data set increases. The decision about the number of classes depends on
the method used by the researcher.
1. Rule 1: To determine the number of classes is to use the smallest positive integer k such
that 2k  n , where n is the total number of observations.
Range HV  LV
Suggested Class Interval  i   
Number of Classes k
Where: HV = Highest value in a data set LV = Lowest value in a data set
k = number of classes i = suggested class interval
2. Rule 2: Another way to determine the class interval
Range
Suggested Class Interval 
1  3.322  logarithm of total frequencies 
3. Rule 3: Another guideline to determine the class interval is to have an ideal number of
classes.
HighestValue  LowestValue
Suggested Class Interval 
Number of Classes

C. Grouped Frequency Distribution


Example: Suppose a researcher wished to do a study on the monthly salary (in thousands pesos)
of call center agents of selected call center companies. The research first would have to collect
the data by asking each call center agents about their monthly salary. The data collected in
original form is called raw data. In this case, the data are:
20.80 24.00 25.40 26.30 28.00 29.90 33.00 28.00 22.80 19.00
22.00 24.60 25.40 26.50 29.00 31.30 34.10 28.10 23.00 19.30
22.25 24.75 25.70 26.70 29.40 32.10 35.70 28.30 23.60 19.80
20.40 23.90 25.00 25.85 28.80 29.80 32.80 27.00 22.40 17.50
20.70 23.90 25.20 26.10 28.90 29.90 32.90 27.20 22.50 17.70
19.95 23.75 24.90 25.70 28.50 29.50 32.60 26.75 22.25 16.10
20.35 23.80 24.90 25.70 28.50 29.60 32.75 27.00 22.30 16.30
22.20 24.80 25.50 26.60 29.30 31.50 34.90 28.20 23.30 19.40
Construct a frequency distribution and determine the following:
a. Range e. Relative frequencies
b. Interval f. Percentages
c. Class limits g. Cumulative frequencies
d. Class boundaries h. Midpoints
Solution:
Step 1: Arrange the raw data in ascending or descending order. In this particular example we
will arrange raw data in ascending order. This will make it easier for us to tally the data.
16.10 19.95 22.25 23.75 24.90 25.70 26.75 28.50 29.50 32.60
16.30 20.35 22.30 23.80 24.90 25.70 27.00 28.50 29.60 32.75
17.50 20.40 22.40 23.90 25.00 25.85 27.00 28.80 29.80 32.80
17.70 20.70 22.50 23.90 25.20 26.10 27.20 28.90 29.90 32.90
19.00 20.80 22.80 24.00 25.40 26.30 28.00 29.00 29.90 33.00
19.30 22.00 23.00 24.60 25.40 26.50 28.10 29.00 31.30 34.10
19.40 22.20 23.30 24.75 25.50 26.60 28.20 29.30 31.50 34.90
19.80 22.25 23.60 24.80 25.70 26.70 28.30 29.40 32.10 35.70
Step 2: Determine the classes.
 Find the highest and lowest value.
Highest Value (HV) = 35.70 and Lowest Value (LV) = 16.10
 Find the range.
Range = Highest Value (HV) – Lowest Value (LV) = 35.70 – 16.10 =
 Determine the number of classes
The objective is to use just enough classes. We can determine the number of classes
(k) using “2 to the k rule”. This will enable us to select the smallest number (k) for the
number of classes such that 2 k ( 2 raised to the power k) is greater than the number of
observations (n). Using our example, there are 80 call center agents (or n = 80). If we apply
k = 6, which means we would use 66 classes, then 2k  26  64 , somewhat less than 80.
Thus, 6 is not enough classes. If we try k = 7, then 2k  27  128 , which is greater than 80.
Therefore, the recommended number of classes is 7.
 Determine the class interval (or width)
Generally the class interval (or width) should be equal for all classes. The classes
must cover all the values in the raw data (that is, from lowest to highest). Class
interval is generated using the formula:
Range HV  LV 19.60
Suggested Class Interval  i      2.80  3
Number of Classes k 7

 Select a starting point for the lowest class limit.


The starting point can be the smallest data value or any convenient number less than
the smallest data value. In our case 16 is used.
 Set the individual class limit
We need to add the interval (or width) to the lowest score taken as the starting point
to obtain the lower limit of the next class. Keep adding unit we reach the 7 classes, as
reflected 16, 19, 22, 25, 28, 31, and 34.
To obtain the upper class limits, we need to subtract one unit to the lower limit of the
second class to obtain the upper limit of the first class. That is, 19 – 1 = 18. Then add the
interval (or width) to each upper limit to obtain all the upper limits.
Class Limits
16 – 18
19 – 21
22 – 24
25 – 27
28 – 31
32 – 34
35 – 37
 Set the class boundaries in each class. To obtain the class boundaries, we need to
subtract 0.5 from each lower class limit and add 0.5 to each upper class limit.
Class Limits Class
Boundaries
16 – 18 15.5 – 18.5
19 – 21 18.5 – 21.5
22 – 24 21.5 – 24.5
25 – 27 24.5 – 27.5
28 – 31 27.5 – 31.5
32 – 34 31.5 – 34.5
35 – 37 34.5 – 37.5
Step 3: Tally the raw data.
Class Limits Class Tally
Boundaries
16 – 18 15.5 – 18.5 IIII
19 – 21 18.5 – 21.5 IIII-IIII
22 – 24 21.5 – 24.5 IIII-IIII-IIII-I
25 – 27 24.5 – 27.5 IIII-IIII-IIII-IIII-III
28 – 31 27.5 – 31.5 IIII-IIII-IIII-II
32 – 34 31.5 – 34.5 IIII-III
35 – 37 34.5 – 37.5 III
Step 4: Convert the tailed data into numerical frequencies.
Class Limits Class Tally Frequency
Boundaries
16 – 18 15.5 – 18.5 IIII 4
19 – 21 18.5 – 21.5 IIII-IIII 9
22 – 24 21.5 – 24.5 IIII-IIII-IIII-I 16
25 – 27 24.5 – 27.5 IIII-IIII-IIII-IIII-III 23
28 – 31 27.5 – 31.5 IIII-IIII-IIII-II 17
32 – 34 31.5 – 34.5 IIII-III 8
35 – 37 34.5 – 37.5 III 3
Step 5: Determine the relative frequency. It can be found by dividing each frequency by the total
frequency.
Class Limits Frequency Relative Frequency Found by
16 – 18 4 0.05 4÷80
19 – 21 9 0.11 9÷80
22 – 24 16 0.20 16÷80
25 – 27 23 0.29 23÷80
28 – 31 17 0.21 17÷80
32 – 34 8 0.10 8÷80
35 – 37 3 0.04 3÷80
Total 80 100

Step 6: Determine the percentage. It can be found by multiplying 100% in each relative
frequency.
Class Limits Frequency Percentage Found by
16 – 18 4 5 (4÷80)×100
19 – 21 9 11 (9÷80) )×100
22 – 24 16 20 (16÷80) )×100
25 – 27 23 29 (23÷80) )×100
28 – 31 17 21 (17÷80) )×100
32 – 34 8 10 (8÷80) )×100
35 – 37 3 4 (3÷80) )×100
Total 80 100
Step 7: Determine the cumulative frequencies. The cumulative frequency can be found by
adding the frequency in each class to the total frequencies of the class preceding that class.
Class Limits Frequency Cumulative Frequency Found by
16 – 18 4 4 4
19 – 21 9 13 4+9
22 – 24 16 29 4+9+13
25 – 27 23 52 4+9+13+23
28 – 31 17 69 4+9+13+23+17
32 – 34 8 77 4+9+13+23+17+8
35 – 37 3 80 4+9+13+23+17+8+3
Step 8: Determine the midpoints. The midpoint can be found by getting the average of the upper
limit and lower limit in each class.
Class Limits Frequency Midpoint Found by
16 – 18 4 17 (14+16)÷2
19 – 21 9 20 (19+21) )÷2
22 – 24 16 23 (22+24) )÷2
25 – 27 23 26 (25+27) )÷2
28 – 31 17 29 (28+31) )÷2
32 – 34 8 33 (32+34) )÷2
35 – 37 3 36 (35+37) )÷2

Stem-and-Leaf
A statistician named John Tukey introduced the stem-and-leaf plot. The objective of this
method is to some extent overcomes the loss of actual observations brought about by the
histogram. The advantage of stem-and-leaf plot over the histogram is that we can see the actual
observations.
The stem is the leading digit or digits and the leaf is the trailing digit. The stem is placed
at the first column and the leaf at the second column.
Example: JDL Travel Agency, a nationwide local travel agency, offers special rates on summer
period. The owner wants additional information on the ages of people taking travel tours. A
random sample of 50 customers taking travel tours last summer revealed these ages.
20 31 39 44 49 51 55 59 63 69
21 33 40 46 50 52 56 60 64 70
26 36 41 47 50 53 56 60 65 72
29 38 41 48 50 53 57 61 66 76
30 38 42 48 51 54 58 62 68 79
Construct a stem-and-leaf plot.
Solution:
The stems (leading digits) for the raw data are 2, 3, 4, 5, 6, 7. The leaves for each stem
(trailing digit) are recorded at the same row and are rank-ordered to form a stem-and-leaf plot.
Stem Leaf
 2 0, 1, 6, 9 
 3 0, 1, 3, 6, 8, 8, 9 
 4 0, 1, 1, 2, 4, 6, 7, 8, 8, 9 
Tens digit  Units digit (trailing digit )
 leading digits  5 0, 0, 1, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9 
 6 0, 0, 1, 2, 3, 4, 5, 6, 8, 9 
7 0, 2, 6, 9

Data analysis is a process of inspecting, cleansing, transforming and modeling data with the
goal of discovering useful information, informing conclusions and supporting decision-making.
Data analysis has multiple facets and approaches, encompassing diverse techniques under a
variety of names, and is used in different business, science, and social science domains. In
today's business world, data analysis plays a role in making decisions more scientific and
helping businesses operate more effectively.
Different Age
Frequency Percent Valid Cumulative
Percent Percent
Valid 30 or 18 45 52.9 52.9
younger
Older than 16 40 47.1 100
30
Total 34 85 100
Missing System 6 15
40 100
It present the age distribution of 40 respondents. The age range of the respondents was 21 – 72
years of age. More than half of the respondents (18) are 30 years old or younger; 16 respondents
are older than 3 years. 6 person did not respond.

It shows the distribution of highest academic qualifications amongst the 50 respondents. It is


clear that the respondents are mostly literate with only 3 respondents being illiterate. The mean
qualification for the respondents is Honour’s Degree, 17 respondents; followed by a post-
Standard 10, one year Certificate or Diploma (8 respondents) ; or Standard 10 (6 respondents).
Four respondents hold Master’s Degrees. Most of the respondents have some form of education.

Respondents 50

It shows that most of the respondents uses Tagalog language which got 70% out of 50
respondents and 30% uses English language.
BS Accountancy students in Santa Maria

It shows the population of 188 BS Accountancy students in Santa Maria. Accordingly, 39


of the respondents are from St. Joseph College of Bulacan which constitutes
20.74% of the total population and 149 students are from Polytechnic University of
the Philippines – Santa Maria Bulacan Campus which constitutes 79.26% of the total
population.
BS Accountancy students in Santa Maria

It shows the population of 188 BS Accountancy students of Santa Maria. 72 or 38.30% of the
total population are from the 1styear level; 51 or 27.13% of the total population are from the
2ndyear level; 35 or 18.62% of the total population are from the 3rdyear level; and 30 or
15.96% of the total population are from the 4thyear level. The population is comprised of
students from St. Joseph College of Bulacan and from Polytechnic University of the Philippines
– Santa Maria Bulacan Campus.
(BS Accountancy students in Santa Maria) https://www.coursehero.com/file/12091460/Chapter-4-
Thesis-Sample/

NO. OF TIMES OF ATTENDANCE OF AIDS WORKSHOPS


This table shows that only 4 (1,0%) of the subjects had not attended any workshop on AIDS
education campaigns. The rest (i.e. 380 or 98,9%) of the population had the opportunity to
attend such workshops although there is a vast difference in the number of times of their
attendance. The total number of those who attended workshops will be used as a total sample
size. The results are based on the subjects’ experiences and not on speculation or what they
believe or think, and should therefore be reliable. It is also interesting to note the high number of
subjects who attended these workshops more than once.
AGE GROUP BREAKDOWN

This table shows the age categories of subjects who took part in the completion of the
questionnaires. The percentage in this table shows that the allocation of questionnaires to
various groups was in no way influenced by bias. It is a true reflection of the researcher’s
impartiality in the distribution of questionnaires.
(No. of times of attendance of aids workshops, age group breakdown)
http://wiredspace.wits.ac.za/bitstream/handle/10539/1485/04chapter4.pdf?sequence=7
5
L
E
S
MEAN, MEDIAN, MODE S
O
I N
After going through this module, you are expected to identify and apply the properties of
the mean, medeian and mode, to compute and interpret the grouped and ungrouped of mean,
median and mode.

What is the purpose of mean, median, mode in our daily life? Give five example. (it’s up to you
which is more mean, median or mode)
1. _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
2. _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
3. _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
4. _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
5. _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
D

PROPERTIES OF THE MEAN


1. If all the observations of the series are constant ‘k’ mean will also be k.

x  mean
Example:
x
x5
5
5 The mean is also 5.
5
5

. Mean can be calculated


5
5
2. If the deviations in the series are taken from mean, the sum of deviations from mean will be
zero.
Example:
x xx
x
x
3 33  0 n
2 2  3  -1
1 1  3  -2 15
x
5 53  2 5
4 4 3 1 x3
 x  15 xx 0
3. Combine mean is also the mean of two series xcm  x12

Example:
The Grade 9 level has seventy students, forty boys and thirty girls. The mean of the boys is
twelve and the mean of the girls is fifteen. What is the average mean of the classroom?

n1 x1  n2 x2
x12 
n1  n2
n1  number of items in series1
n2  number of items in series 2
x1  mean of series 1
x2  mean of series 2

x12 
 40  12    30  15
40  30
480  450
x12 
70
x12  13.5

4. Combine mean is also the mean of third series. xcm  x123

Example: In sports fest there are three players of basketball. Team A the number of players has
five and the average score is 100. Team B has eight players and the average score is 120. Then
the Team C has six players and the average score is eighty.

n1 x1  n2 x2  n3 x3
x123 
n1  n2  n3

n1  number of items in series1


n2  number of items in series 2
n3  number of items in series 3
x1  mean of series 1
x2  mean of series 2
x3  mean of series 3

x123 
 5  100   8  120    6  80 
586
500  960  420
x123 
19
x123  102.10

5. Change in origin and change in scale. Change in origin is either plus or minus its directly
affect the mean y  a  x . Change in scale is either multiplication or division its directly affect
the mean y  bx .
x y   x  2 z   x  1 y  2x
3 5 2 6
2 4 1 4
1 3 0 2
4 6 3 8
5 7 4 10
 x  15  y  25  z  10  y  30
x3 y5 z2 y6

6. Mean can be calculated for any set of numerical data.


7. A set of numerical data has one and only one mean.
8. Mean is the most reliable measure of central tendency since it takes into account every item in
the set of data.
9. The mean is affected by usually large or small data values.
10. The sum of the differences between individual observations and the mean is zero.
11. The product of the mean and the number of items on which means is based is equal to the
sum of all given items.
12. The sum of squares of deviation of set of values about its mean is minimum.
13. If each item of the original series is replaced by the actual mean, then the sum of those
substitutions will be equal to the sum of the individual items.
14. Mean is not independent of change of origin and change of scale.
WHEN TO USE THE MEAN?
The mean is used when both of the following conditions are met:
1. Data is scaled
a. Data with equal intervals like speed, weight, height, temperature, etc.
2. Distribution is normal
b. The mean is sensitive to outliers that are found in skewed distributions, you should
only use the mean when the distribution is more or less normal.
The arithmetic mean, often called as the mean, is the most frequently used measure of central
tendency. The mean is the only common measure in which all values plays an equal role
matching to determine its values you would need to consider all the values of any given data set.
The mean is appropriate to determine the central tendency of an interval or ratio data.

The symbol x , called “x bar”, is used to represent the mean of a sample and the symbol  ,
called “mu”, is used to denote the mean of a population.
MEAN FOR UNGROUPED DATA
sum of all values
mean 
number of values

Sample mean: x 
x
n

where: x = sample mean (it is read as x bar)


x = the value of any particular observations or measurement.

 x = sum of all x’s


n = total number of values in the sample

Population mean:  
x
N
where:  = sample mean (it is read as x bar)

x = the value of any particular observations or measurement.

 x = sum of all x’s


N = total number of values in the population
Example 1: The daily rates of a sample of eight student working at ACNSTHS are 550php,
420php, 560php, 500php, 700php, 670php, 860php, 480php. Find the mean of daily rate of
employee.

x
x  x x
1  x3... xn
2

n n
550  420  560  500  700  670  860  480
x
8
4740
x
8
x  592.50
The sample mean daily salary of student working is 595.5php.
Example 2: Find the population mean of the ages of nine middle-management of a certain make
up company. The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.

N  ____  fX  _____ x  _______


SAMPLE MEAN FOR GROUPED DATA

Sample Mean: x 
 fx
n

where: x = sample mean (it is read as x bar)


f = frequency
x = the value of any particular observations or measurement.

 fx = sum of all x’s


n = total number of values in the sample

Population Mean:  
 fx
N
where:  = sample mean (it is read as x bar)

f = frequency
x = the value of any particular observations or measurement.

 fx = sum of all x’s


n = total number of values in the sample
Example 3: DRL Cosmetic Agency, a nationwide cosmetic agency, offers special rates on
summer period. The owner wants additional on the ages of those people taking beauty tour. A
random sample of 50 customers taking beauty tour last summer revealed these ages.
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Determine the mean of the frequency distribution on the ages of 50 people taking beauty tours.
Given the table:
Class Limits Frequency(f)
18-28 3
27-35 5
36-44 9
45-53 14
54-62 11
63-71 6
73-80 2
Solution:
Step 1: Determine the midpoint on each class limit.
Class Limits (f) Midpoint (X)
18-28 3 22
27-35 5 31
36-44 9 40
45-53 14 49
54-62 11 58
63-71 6 67
73-80 2 76

Step 2: Multiply each class frequency (f) with the corresponding midpoint (X) to obtain the
product fX. And get the sum of fX.
Class Limits (f) (X) fX
18-28 3 22 66
27-35 5 31 155
36-44 9 40 360
45-53 14 49 686
54-62 11 58 638
63-71 6 67 402
73-80 2 76 152
Total 50 -----  fX  2, 459

Step 4: Apply the formula to obtain the value of the sample mean.

x
 fX 
2459
 49.18
n 50
PROPERTIES OF MEDIAN
1. If all the observations of the series are constant the ‘k”, median will also be ‘k’.
x
5
5
5
5
5
5
5
Mean = 5
Median = 5
2. Change in origin and change in scale. Both change in origin change in series directly affects
median. (Similar to mean).
3x  2 y  8  0
Median of x  1
Median of y  ____

Solution:
3x  8  2 y
2 y  3x  8
3x 8
y 
2 2
3
y  x4
2
11
y
2
3. If absolute deviations are taken from median, the sum of absolute deviation will be minimum.
The sum of absolute deviation is the minimum from median.
x x 5 x  5 (absolute
deviation)
3 Deviation 3  5  -2  2
2 from 5
5 2  5  -3  3
7 55  0  0
3
75  2  2
3  5  -2  2

4. Median is not dependent on all the data values in a data set.

5. The median value is fixed by its position and is not reflected by the individual value.

6. The distance between the median and the rest of the values is less than the distance from any
other point.

7. Every array has a single median.

8. Median cannot be manipulated algebraically. It cannot be weighed and combined.

9. In a grouping procedure, the median is stable.

10. Median is not applicable to qualitative data.


11. The values must be grouped and ordered for computation.

12. Median can be determined for ratio, interval and ordinal scale.

13. Outliers and skewed data have less impact on the median.

14. If the distribution is skewed, the median is a better measure when compared to mean.
15. There is a unique median for each data set.
16. It is not affected by extremely large or small values and is therefore a valuable measure of
central tendency when such values, occur.
17. Median is used when the data are ordinal.
18. Median can determine by graphic method.
19. The sum of absoulute deviations taken from the median is less than the sum of the absolute
deviations taken from any other observations in the data.
WHEN TO USE THE MEDIAN?
The median is used when either one of two conditions are met. If the,
1. Data is ordinal.
2. Distribution is skwed or normal

The median is the midpoint of the data array. When the data set is ordered whether ascending or
descending, it is called a data array. Median is an appropriate measure of central tendency for
data that are ordinal or above, but is more valuable in an ordinal type of data.
MEDIAN FOR UNGROUPED DATA
To determine the value of median for ungrouped we need to consider two rules:
1. If n is odd, the median is the middle ranked.
2. If n is even, the median is the average of the two middle ranked values.
n 1
median  Rank Value  
2
Note that n is the population/sample size.
Example 1: Find the median of the ages of 9 middle-management employees of a certain
company. The ages are 53,45,59,48,46,51,58 and 55.
Solution:
Step 1: Arrange the data in order.
45,46,48,51,53,54,55,58,59
Step 2: Select the middle rank value.
n  1 9  1 10
median  Rank Value     5
2 2 2
Step 3: Identify the median in the data set.
45,46,48,51,53,54,55,58,59

5th
Hence the median age is 53 years.
Example 2: The daily rates of a sample of eight employees at DRL Inc. are 550php, 420php,
560php, 500php, 670php, 860php, 480php. Find the median daily rate of employee.
Solution:
Step 1: Arrange the data in order.
420,480,500,550,560,670,700,860
Step 2: Select the middle rank value.
n 1 8 1 9
median  Rank Value      4.5
2 2 2
Step 3: Identify the median in the data set.
420,480,500,550,560,670,700,860

4.5th
Since the middle point falls between 550 and 560, we can determine the median of the data set
by getting the average of the two values.
550  560 1110
median  Rank Value     555
2 2
Therefore, the median daily rate is 555php.
MEDIAN FOR GROUPED DATA
Take note that the median is located in the middle value of the frequency distribution. It is the
value that separates the upper half of the distribution from the lower half. It is also obvious to
note that it is a measure of central tendency because it is the exact center of the scores in a
distribution.
N
median  Rank Value  
2
N 
  cf 
median  LB   2  i 
 f 
 
where: LB = lower boundary of the median class.
N = number of sample size (or population)
cf = cumulative frequency before the median class
f = frequency of the median class
i = interval
Example 1: DRL Cosmetic Agency, a nationwide cosmetic agency, offers special rates on
summer period. The owner wants additional on the ages of those people taking beauty tour. A
random sample of 50 customers taking beauty tour last summer revealed these ages.
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Determine the median of the frequency distribution on the ages of 50 people taking tours. Given
the table:
Class Limits Frequency(f)
18-28 3
27-35 5
36-44 9
45-53 14
54-62 11
63-71 6
73-80 2

Solution
Step 1: Determine the Median Class
N 50
median  Rank Value     25
2 2
Step 2: Construct a cumulative frequency column in the table.
Class Limits Frequency(f) cf
18-28 3 3
27-35 5 8
36-44 9 17
45-53 14 31
54-62 11 42
63-71 6 48
73-80 2 50
Step 3: Identify the median class by locating the 25th ranked in the table.
Class Limits Frequency(f) cf This class covers 18th to 31st rank in the
18-28 3 3 frequency distribution. The 25th rank belongs
27-35 5 8 in this class.
36-44 9 17
45-53 14 31
54-62 11 42
63-71 6 48
73-80 2 50
Step 4: Determine the values of LB, cf, f, i, and N.

Class Limits Frequency(f) cf


18-28 3 3 Median Class
LB  45  0.5  44.5 27-35 5 8
36-44 9 17
i  27  18  9 45-53 14 31
or 54-62 11 42
i  35  26  9 63-71 6 48
73-80 2 50

Step 5: Compute for the value of the median.

N 
 2  cf 
median  LB    i 
 f 
 
 50 
 2  17 
median  44.5    9
 14 
 
 25  17 
median  44.5    9
 14 
median  44.5  5.14
median  49.64
Thus, the median is 49.64, observed that the median will fall within the class boundary of the
median class.
PROPERTIES OF MODE
1. The mode is found by locating the most frequently occuring value.
2. The mode is easiet average to compute.
3. There can be more than one mode or even no mode in any given data set.
4. Mode is not affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal, interval, and ratio data.
6. The mode is not always unique. A data set can have more than on mode, or the mode may not
exist for a data set.
7. Can be used for qualitative as well as quantitative data.
8. Not affected by extreme values.
9. The mode can be used when the data are nominal or categorical, such as religious preference,
gender, or political affiliation.
10. It cannot be manipulated algebraically: modes of subgroups cannot be combined.
If all the observations of the series are constant the ‘k”, median will also be ‘k’.
x
5
5
5
5
5
5
5
Mean = 5
Median = 5
Mode = 5

11. Change in origin and change in scale. Both change in origin change in series directly affects
mode. (Similar to mean and median).
solution :
y  3  2x y  2  2x
mode of x  2 y  2  2  2
mode of y  _____ y6
mode of y  6

WHEN TO USE THE MODE


The mode is used when you want to know the most frequent response, number or observation in
a distribution.
The mode is the value in a data set appears most frequently. Like the median and unlike mean,
extreme values in a data set do not affect the mode. A data may not contain any mode if none of
the values is “most typical”.
A data set that has only one value that occur the greatest frequency is said to be unimodal. If the
data has two values with the same greatest frequency, both values are considered the mode and
the data set is bimodal. If a data set have more than two modes, and the data said to be
multimodal. There are some cases when a data set values have the same number of frequency,
when this occur, the data set is said to be no mode.
MODE FOR UNGROUPED DATA
Example 1: The following data represent the total unit sales for PSP 2000 from a sample of 10
Gaming Centers for the month of August: 15, 17, 10, 12, 13, 10, 8, and 9. Find the mode.
The ordered array for these data is 8, 9, 10, 10, 12, 13, 14, 15, 17.
Because 10 appears 3 times than the other value, therefore the mode is 10.
MODE FOR GROUPED DATA
Example 2: Compute the mode of the test scores.
Scores Frequency
41-45 1
36-40 8
31-35 8
26-30 14
21-25 7
16-20 2
 D1 
mode  lbmo   i
D
 1  D2 

1.modal class: 26-30


2. lbmo  26  0.5  25.5
3.D1  14  7  7
4.D2  14  8  6
5.i  21  16  5
 7 
mode  25.5   5
76
mode  28.19
L

6
E
S
RANGE S
O
I N
After going through this module, you are expected to compute for ungrouped and
grouped data of the range.

Compute the following:


1. The age of your Father minus the age of your mother.
2. The oldest of your siblings minus the youngest of your siblings.
3. The oldest of your friends minus the youngest of your friends.

D
Probably the simplest and easiest way to determine measures of dispersion is the range. The
range is the difference of the highest value and the lowest value in the data set.
There are two advantages of the range that is it is easy to compute and easy to understand. On
the other hand, it also has two disadvantages, it can be distorted by a single extreme value(or
outlier) and only two values are used in the calculation.
UNGROUPED DATA
R  HV  LW
Example1: The daily rates of a sample of eight employees at DRL Inc. are 550php, 420php,
560php, 500php, 700php, 670php, 860php, 480php. Find the range.
Step1: Determine the highest value and lower value in the data set.
Highest Value(HV)=860php Lowest Value(LV)-420php
Step2: Solve the range.
Range =Highest Value(HV)-Lowest Value(LV)
=860-420
=440
The range in daily rate salary is 440.
GROUPED DATA
R  HB  LB
HB : Highest Boundary
LB : Lowest Boundary

Example2:
Step1: Compute for the Class Boundary
Class Limit Frequency Class Boundaries
6-8 5 5.5-8.5
9-11 6 8.5-11.5
12-14 7 11.5-14.5
15-17 4 14.5-17.5
18-20 3 17.5-20.5

Step 2: Identify the lowest and highest in the class limit.


RI  20  6
RI  14

Step 3: Compute the range.


R(CLCB )  20.5  5.5
R(CLCB )  15

The range is 15.

E
Directions: Solve the following problem. Show your complete solution:
1. The age of employees in DRL inc. are 23,45, 56, 43, 28, 32, 58, 40, 42, 60 and 58. Find the
range.
2. Find the range using grouped data.
Class Limit Frequency Class Boundaries
10-14 8
15-19 5
20-24 6
25-29 3
30-34 9
L

9
E
S
QUARTILE S
O
I N
After going through this module, you are expected to compute for ungrouped and
grouped data of the standard deviation.

Let’s Warm Up
DO the activities below:
1. List down you friends and from your friends get the 25% of it.
2. List down your favorite food and from your favorite food get the 25% of it.
3. List down your favorite movie and from your favorite movie get the 25% of it.

D
When presenting or analyzing data set it is sometimes helpful to group subjects into several
equal groups. For example, to create four equal groups we need the values that split the data
such that 25% of the observation are in each group. The cut off points are called quartiles, and
there are three (3) of them (the middle one also called the median). The general term for such
cutoff points is quartiles, other values likely to be encountered are deciles, which split data into
10 parts, and percentiles, which split the data into 100 parts (also called centiles). Values such
as quartiles can also be expressed as percentiles, for example, the lowest quartile is also the 25th
percentile and the median is the 50th percentile or the 5th decile.
UNGROUPED DATA

k  N  1
Qk 
4

Where: Qk = quartile

N = population
k =quartile location
Example1: Find the first, second and third quartiles of the ages of 9 middle-management
employee of a certain company. The ages are 53, 45, 59, 48, 46, 51, 58, and 55.
Solution:
Step1: Arrange the data in order.
45,46,48,51,53,54,55,58,59
Step 2: Select the first, second, and third quartiles value.
1( N  1) 1 9  1 10
Q1     2.5
4 4 4

2( N  1) 2  9  1 20
Q2    5
4 4 4

3( N  1) 3  9  1 20
Q3     7.5
4 4 4

Step 3: Identify the first, second, and third quartile values in the data set.
45,46,48,51,53,54,55,58,59

2.5th 5th 7.5th


Since the 2.5th falls between 46 and 48; and 7.5th falls between 55 and 58 we can determine the
first and third quartiles of the data set by getting the average of the two values.
46  48 92 55  58 113
Q1    47 Q3    56.5
2 4 2 2

Therefore, Q1  47, Q2  53, and Q3  56.5

GROUPED DATA

 kN 
 4  cf 
Qk  LB    i 
 f 
 

where: Qk = quartile

N = population
k = quartile location
LB = lower boundary of the quartile class
f = frequency of the quartile class

cf = cumulative frequency before the quartile class

i = class interval
Example 2: DRL Travel Agency, a nationwide local travel agency, offers special rates on
summer period. The owner wants additional information on the ages of people taking travel
tours. A random sample of 50 customers taking travel tours last summer revealed these ages.
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Step 1: Class limit and frequency and cumulative frequency
Class Limit Frequency cf
18-26 3 3
27-35 5 8
36-44 9 17
45-53 14 31
54-62 11 42
63-71 6 48
72-80 2 50
Total 50
Step 2: Determine the Q1 class.
N 50
Q1   ranked value     12.5
4 4
Step 3: Identity the class by locating the 12.5th ranked in the table.
Class Limit Frequency cf
18-26 3 3
27-35 5 8
36-44 9 17
45-53 14 31
54-62 11 42
63-71 6 48
72-80 2 50
Total 50

Step 4: Determine the values of LB, cf, f, i, and N.


Class Limit Frequency cf
18-26 3 3
27-35 5 8
LB  36  0.5  35.5 36-44 9 17 Q1 class
45-53 14 31
54-62 11 42
63-71 6 48
72-80 2 50
Total 50
Step 5: Compute for the value of the first quartile.
N   50 
  cf   4 8 
Q1  LB   4   i   35.5     9   35.5  4.5  40
 f   9 
   

Thus, the is Q1 40, observed that the will Q1 fall within the class boundary of Q1 class.

Note: Applying the same procedure to obtain the values of Q2 and Q3 .

2 N 2  50 
  25
Locate the second quartile rank: 4 4
Class Limit Frequency cf
18-26 3 3
27-35 5 8
36-44 9 17
LB  45  0.5  44.5 45-53 14 31 Q2 class
54-62 11 42
63-71 6 48
72-80 2 50
Total 50
 2N   2  50  
 4  cf  
4
8 
Q2  LB     i   44.5     9   44.5  5.14  49.64
 f   14 
  
 

3 N 3  50 
  37.5
Locate the third quartile rank: 4 4
Class Limit Frequency cf
18-26 3 3
27-35 5 8
36-44 9 17
45-53 14 31
LB  54  0.5  53.5 54-62 11 42 Q3 class
63-71 6 48
72-80 2 50
Total 50
Q
Hence, 2 is 49.64and 58.82 .

You might also like