Basic Concepts in Statistics-Aggie
Basic Concepts in Statistics-Aggie
BERLITA Y. DISCA
Math/Physics Dept, CNSM
MSU-General Santos City
I. Introduction
One of the important areas in applied mathematics is that of statistics. Statistics deals with
scientific methods of collecting, organizing, summarizing, presenting, and analyzing data. It
helps in drawing valid conclusions and making reasonable decision on the basis of the analysis at
hand. The things known or assumed are called data and these served as the bases for inference.
Thus, we have statistics on board exams passers and topnotchers, arson statistics, vehicular
accident statistics, crime statistics, etc.
The basic concern in the study of statistics is the presentation and interpretation of chance
outcomes that occur in a planned or scientific investigation.
Two major areas in statistics are the descriptive statistics are those methods concerned with
collecting and describing a set of data so as to yield meaningful information. It provides
information only about the collected data and in no way draws inferences or conclusions
concerning a larger set of data. The construction of tables, charts, graphs, and other relevant
computations in various newspapers and magazines usually fall under this method. The other
method is the statistical inference or inferential statistics. This method deals with the analysis of
a subset of data leading to predictions or inferences about the entire set of data. The main
function of statistical inference is estimation of population parameters and tests of hypotheses.
II. Probability
Probability is the chance that something is likely to happen. The concepts in probability theory
were originated in the study of games of chance. These theories were used extensively in areas
such as statistics, mathematics, science and philosophy to draw conclusions about the likelihood
of potential events and the underlying mechanics of complex systems.
In probability we need to define the following: Sample Space – the set of all possible outcomes
in a statistical experiment; Sample Point – each outcome in a sample space is called an element
or a sample point; Events – an event is a subset of a sample space; and the Null Space – is a
sample space that contains no element.
We also need to know the operations we used like the Intersection of events - A B is the
event containing all elements common to both A and B. The Mutually Exclusive Events - If
A B , that is, A and B have no common elements. On the other hand, Union of two Events-
A B is the event containing all the elements that belong to A or to B or to both and The
Complement of an Event A (symbolized by A’) with respect to S is the set of all elements in S
outside of A.
If one experiment has m outcomes and another experiment has n outcome, then there are mn
possible outcomes for the two experiments.
Example 1. If an experiment consists of drawing a letter from an English alphabet and tossing a
pair of coins, find the total sample points.
If there are p experiments and the first has n1 possible outcome, the second has n 2 possible
outcomes, the third n3 , . . . and the pth n p , then there are a total of n1 . n2 . n3 . . .n p possible
outcomes for the p experiments.
Example 2. A 4-bit binary word is a sequence of 4 digits, of which may be either a 0 or a 1. How
many different 4-bit words are there?
(2)(2)(2)(2) = 16
Example 2. How many different samples of size 5 can we form the letters A, B, C, D, E?
5! = 5 . 4. 3. 2.1 = 120
The number of ways of selecting r object from n distinct objects is the permutation taken r at a
time is given by the formula.
Pr Pn, r
n!
n
n r !
Example 3. How many different samples of size 3 can we form the letters A, B, C, D, E?
Pr Pn, r
n! 5!
60
n
n r ! 5 3!
The number of arrangements of n objects of which n1 are alike of one kind, n 2 are alike of
another kind and so forth, is
n!
where n1 n2 n3 . . . nk n .
n1! n2! n3! . . . nk !
Example 4. How many distinct permutations can be made from the letter of the word
“agriculturist”?
13!
389,188,800 .
2! 2! 2! 2!
Example 5. In how many can 5 people be arranged in a round table? n 1! 5 1! 4! 24
Cr C n, r
n!
r! n r !
n
2
Example 6. How many ways can a committee of 4 is selected at random from a group of 6 men
and 8 women?
a. with no restriction.
C n, r C 14,4
14!
1001
4!10!
b. with two men and two women if a certain man must be on the committee.
C 1,1
1!
1 a certain man must be on the committee
1!0!
C 5,1
5!
5 another man must be on the committee
1!4!
C 8,2
8!
28 two women must be on the committee
2!6!
Probability of an Event:
Properties:
1. 0 P A 1
2. P 0
3. PS 1
4. If A and B are disjoint, then
P A B P A PB .
In general, P A1 A2 A3 An P A1 P A2 P An for mutually exclusive events.
5. P Ac 1 P A
6. The Addition Law: P A B P A PB P A B
Examples:
7. A coin is tossed twice. What is the probability that at least 1 head occurs?
8. If a card is drawn out of an ordinary deck of cards, find the probability that it is a heart.
9. From a box containing 3 green apples, 5 red apples, and 6 yellow apples, an apple is
drawn. What is the probability that the apple is
a. red
b. green or yellow
2
10. The probability that a student will pass his Statistics course is and the probability
3
4 14
that he passes English 10 is . If the probability of passing both courses is , what is
9 45
the probability that he will pass at least one course?
11. A pair of dice is tossed. Find the probability that the sum is
c. 6
d. 8 or 10
3
Conditional Probability
Let A and B be two events with PB 0 . The conditional probability of A given B is defined to
be
P A B
P A / B
P B
Example 12. A math teacher gave her class two tests. 25% of the class passed both tests and 42%
of the class passed the first test. What percent of those who passed the first test also passed the
second test?
Solution: Let A the event that a student passed the second test and let B the event that a student
passed the first test. Let A/B the event that those who passed the first test also passed the second
test
P A B .25
P A / B .60 60%
P B .42
Multiplication Law
P A B P A / B PB .
Example 13. A box contains three red balls and 2 blue balls. Two balls are selected without
replacement. What is the probability that they are both red?
Solution: Let A and B denote the events that a red ball is drawn on the first and second trial,
respectively.
3
P(A) = . If the a red ball has been removed on the first trial, there are 2 red and 2 blue balls left
5
PB / A . By multiplication law,
2 1
4 2
1 3 3
P A B P A / B PB .
2 5 10
When we deal with researches we need to know the sampling procedures and how are we going
to go about the collection of data. Here, there are terms that we need to know the meaning.
Population - consists of the totality of the observations with which we are concerned while its
subset is known as the Sample. A numerical value that describes the characteristics of the
population is called a Parameter and Statistic is any numerical value that describes the
characteristics of the sample.
4
involved. The technique is inferential when it involves estimation of population parameters and
tests hypothesis.
MEASUREMENT
Measurement refers to the process of determining the value or label, either qualitatively or
quantitatively, of a particular variable for a particular unit of analysis.
Levels of Measurement
1. Level Characteristics
2. 1.Nominal - numbers or symbols are used simply to classify an object, person, or
characteristics into categories
- the categories must be distinct, non-overlapping and exhaustive and weakest level
of measurement
Example: Religious Affiliation:
Roman Catholic Iglesia Ni Cristo Baptist
Alliance Islam Others
2. Ordinal - contains the property of the nominal level
- the numbers assigned to categories of any variable may be ranked or ordered in
some low-to-high manner
Example: Blood Pressure: High Normal Low/ Anemic
3. Interval - contains the properties of the ordinal level
- the distances between any two numbers on the scale are of known sizes
- characterized by a common and constant unit of measurement
- units of measurement are arbitrary
- the number zero does not imply the absence of the characteristic under
consideration (thus, the zero point is arbitrary)
Examples: Temperature in oC and oF Intelligence quotient (75,100, etc.)
4. Ratio - contains the properties of the interval level
- has a true zero point, that is, the number zero indicates the absence of the
characteristic under consideration
- strongest level of measurement
Examples: Scores in a test Height in meters, feet, etc.
Nonprobability
procedure wh wherein not all the sampling elements in the population are given
a chance of being included in the sample
5
Methods of Nonprobability Sampling
Purposive Sampling sets out to make a sample agree with the population
in regard to certain characteristics
1. Simple Random Sampling (SRS) - is the process on selecting a sample of size n, giving
each sampling unit an equal chance of being included in the sample. A SRS of n
observations of the population is a sample that is chosen in such a way that each subset of
n observations of the population has an equal chance of being selected.
Advantages
The theory involved is easier to understand than the theory behind other sampling designs
Estimation methods are simpler and easier.
Disadvantages
The sample chosen may be widely spread, thus entailing high transportation costs.
A population list, or frame, is needed.
The sample chosen may not be truly typical of the population if the population is
heterogeneous with respect to the characteristic under study.
2. Stratified Sampling - There are cases wherein the population consists of items that are
heterogeneous with respect to the characteristic under study. In such situations the
population should be divided, or stratified, into more or less homogeneous
subpopulations or strata before sampling is done.
Stratified random sampling then consists of selecting a SRS from each of the strata into which
the population has been divided.
Advantages
6
Stratification may bring about a gain in precision of the estimates of characteristics of the
population.
It allows a more comprehensive data analysis since information is provided in each
stratum.
It is administratively convenient.
Disadvantages
A listing of the population for each stratum is needed.
The stratification of the population may require additional prior information about the
population and its strata.
Advantages
Drawing of the sample is administratively easy.
It is possible to select a sample in the field without a frame.
Disadvantage
If periodic irregularities are found in the list, a systematic sample may consist only of
similar types.
The number of clusters C in population is called the size of the population clusters. Clusters may
be of equal or unequal sizes.
Advantages
A population list is not needed.
Listing cost is reduced.
Disadvantages
The costs and problems of statistical analysis are greater.
Estimation procedures are difficult.
7
IV. Presentation of Data
After deciding what sampling procedures to be used and data had been collected the next step is
to present the data.
Measure of Central Location or Measure of Central Tendency- any measure indicating the center
of a set of data, arranged in an increasing or decreasing order of magnitude. The most commonly
used are the mean, median, and mode.
If the set of data x1, x2 , , xn , not necessarily all distinct represents a finite population of
size n, then the population mean and the sample mean are given respectively below
N n
x i x i
i 1
and x i 1
N n
MEDIAN:
MODE
The mode of a set of observations is that value which occurs most often or with the greatest
frequency. The mode does not always exist. This is certainly true when all observations occur
with the same frequency.
Remarks:
1. The mean is the most commonly used measure of location in statistics. It is easy to
calculate and it employs all the variable information. The disadvantage of the mean is it is
adversely affected by extreme values.
2. The median is easy to compute if the number of observations is relatively small. It is not
affected by extreme values.
3. The mode is the least used measure of the three. Its value is almost useless for small sets of
data. It requires no calculation. It can be used for both quantitative and qualitative data.
2. MEASURES OF VARIATION
8
The measures of central location do not give an adequate description of our data. We need to
know how the observations spread out from the average. The most important statistics for
measuring the variability of a set of data are the range and the variance:
RANGE
The range of a set of data is the difference between the largest and smallest number in a set. The
range is very simple to compute. However, it is a poor a measure of variation, particularly if the
size of the sample is large. It only considers the extreme values and it tells us nothing about the
distribution of numbers in between.
POPULATION VARIANCE:
(x )i
2
2 i 1
SAMPLE VARIANCE
x x
n 2
i
s2 i 1
n 1
n xi2 xi
2
s
2
n n 1
The standard deviation provides a method for converting observed variances to standard form so
that they can be more easily understood and compared. The variance and standard deviation
provide the most powerful estimate of variation because they consider the value of each score.
Remarks:
1. The range is the least reliable of the measures and is used only when one is in a hurry to
get a measure of variability. It may be used for ordinal, interval, or ratio data.
2. The most important measures of variability are the standard deviation and its square, the
variance. The variance is the average of the squared deviation around the mean.
Important characteristics of a large mass of data can be readily assessed by grouping the data into
different classes and then determining the number of observations that fall in each class. Such an
arrangement in tabular form is called frequency distribution.
9
Data that are presented in the form of frequency distribution are called grouped data. The data of
a sample are often grouped into intervals to produce a better overall picture of the unknown
population, but in so doing the identity of the individual observations are lost.
The lowest and largest values that can fall in a class interval are called class limits. The number
of observations falling in a particular class interval is called class frequency. The numerical
difference between the upper and lower class boundaries of a class interval is defined to be the
class width. The midpoint of the class interval called the class mark is the average of the class
limits.
The total frequency of all values less than the upper class boundary of a given class interval is
called the cumulative frequency up to and including that class.
The steps in grouping a large set of data into a frequency distribution may be summarized as
follows:
1. Decide on the number of class intervals required. (We can choose between 5 and 20 class
intervals)
2. Determine the range.
3. Divide the range by the number of classes to estimate the approximate the width of the
interval.
4. List the lower class limit of the bottom interval and then the lower class boundary. Add
the class width to the lower class boundary to obtain the upper class boundary. Write
down the upper class limit.
5. List all the class limits and class boundaries.
6. Determine the class marks.
7. Tally the frequencies for each class.
8. Sum the frequency column and check against the total number of observations
Example1: The following represent the final examination scores in elementary statistics course:
85 49 23 60 79 32 57 74 52 70 82 36
80 77 81 95 41 65 92 85 55 76 33 92
52 10 64 75 78 25 80 98 81 67 68 24
41 71 83 54 64 72 88 62 74 43 79 83
60 78 89 76 84 48 84 90 15 79 84 66
34 67 17 82 69 74 63 80 85 61 89 67
GRAPHICAL REPRESENTATIONS
The information provided by a frequency distribution in tabular form is easier to grasp if
presented graphically. A visual picture is beneficial in understanding the essential features of
frequency distribution.
Bar Chart – plotting the class frequency against the class limits.
Cumulative Frequency Polygon or Ogive – cumulative frequency against the upper class
boundaries
10
GROUPED DATA: FORMULAS FOR MEASURES OF CENTRAL TENDENCY
MEAN
fi xi
n
where : f i the frequency of class int erval i
xi the midpo int of class int erval i
f i xi the sum of the products of the frequency and midpo int of the class
int erval i
or by the use of codes:
MEAN A
f u c
i i
n
where A = is the midpoint of class interval assigned with a code of zero
fi = the frequency of class interval i
xi = the code of class interval i
c = the class width
fi ui = the sum of the products of the frequency and the code of the class intervals
n
cf c
MEDIAN L1
2
f
where L1 = the lower class boundary of the median class
n = the sample size
cf = the cumulative frequency of the class right below the median class
f = the frequency of the median class
c = the class width
n
median class = class interval where the the observation falls
2
MODE L1 1
d c
d1 d 2
where L1 = the lower class boundary of the modal class
d1 = the difference between the frequencies of the modal class and the class right
below the modal class
d2 = the difference between the frequencies of the modal class and the class right
above the modal class
c = the class width
modal class = the class interval having the highest frequency.
n fi xi2 fi xi
2
s 2
n n 1
n f i ui2 f i ui 2
s 2 c2
n n 1
11
where: f i = the frequency of class interval i
xi = the code of class interval i
c = the class width
fi ui = the sum of the products of the frequency and the code of the class
interval
The quartile deviation is used when the median is used as an average; when the data
depart noticeably from the normal. It is used for ordinal data.
Q 3 Q1
In symbols: Q
2
n 3n
cf c cf c
Q1 L1 Q3 L1
4 4
where: ,
f f
Class f
interval
36-40 2
31-35 8
26-30 12
21-25 18
16-20 10
Complete the table and find the mean, median, mode, variance, sd, Q1 , Q3, and Q.
Make the bar chart, histogram, frequency polygon, and ogive.
Solution:
Class f Class Class Marks u fu fu2 fixi fx2 cf
interval boundaries x
36-40 2 35.5-40.5 38 2 4 8 76 2888 50
31-35 8 30.5-35.5 33 1 8 8 264 8712 48
26-30 12 25,5-30.5 28 0 0 0 336 9408 40
21-25 18 20.5-25.5 23 -1 -18 18 414 9522 28
16-20 10 15.5-20.5 18 -2 -20 4 180 3240 10
a) MEAN
f i xi
=
1270
25.4
n 50
MEAN A
f u c i i
= 28
26
5 28 2.6 25.4
n 50
n 50
cf c 10
b) MEDIAN L1 = 20.5 2 5 24.67
2
f 18
12
c) MODE L1
d1 c = 20.5
8
5 23.36
d1 d 2 86
d) Variance
n fi xi2 fi xi 5033770 1270
2
1688500 1612900 75600
2
s
2
= = 30.86
n n 1 5049 2450 2450
n f i ui2 f i ui 2 5074 262 2 3700 676
s
2
n n 1
c2 =
5 25 30.86
50 49 2450
e) Standard Deviation
s 30.86
n 50
cf c 10
f) Q1 L1 4 = 20.5 4 5 20.5 0.69 21.19
f 18
3n 3
cf c 50 28
g) Q3 L1 = 25.5 4
4 5 25.5 3.96 29.46
f 12
Q Q1 29.46 21.19
h) Q 3 = 4.14
2 2
The Measure of Skewness is a value that measures the degree of departure of the distribution
from symmetry. The formula for solving the coefficient of skewness (SK) is defined as
3 Md
SK
Remark:
The distribution of the data is symmetric when SK = 0. It is negatively skewed when SK
< 0 and it is positively skewed when SK > 0.
The Measure of Kurtosis is a value that measures the flatness or peakedness of a frequency
distribution. It is computed using the formula
1 k
n
4
xi x fi
K i 1 4
s
For a normal distribution K = 3, the distribution is mesokurtic. If K > 3, the distribution is
leptokurtic; and if K < 3, the distribution is platykurtic.
There is a need to measure the degree of association of variables to express quantitatively the
extent to which they are related.
13
In determining the association, two sets of measurements are obtained on the same individuals or
unit or on pairs of individuals or units who are matched on some basis. The association is then
tested for significance.
Variable Variable X
Y 0 1 Total
0 A B A+B
1 C D C+D
Total A+C B+D N
AD BC
A B C D A C B D
Testing for Significance
NAD BC2
2
A B C D A C B D
which follows a chi-square distribution with one degree of freedom.
Decision Rule
Reject the null hypothesis and conclude significant association when the p-value
associated with the statistic is les than or equal to the prescribed level of significance.
Example : A significant relation between student’s career choices in college to their father’s
occupation has been hypothesized by a number researchers. Do the following data support the
hypothesis? What is the index of relationship?
14
AD BC 171 11
= 0.462
A B C D A C B D (20)(20)(30)(10)
The chi-square test is used to determine if two categorical variables are dependent or not.
It tests the association of two variables A and B, A having r categories and B having k
categories.
Remarks:
There should be no empty cells.
No more than 20% of the cells must contain an expected frequency less than 5.
Null Hypothesis
2
n ij E ij 2
E ij
i j
Reject Ho and conclude that there exists a significant association between A and B if the
value of the test statistic is less than the critical value obtained using (r-1)(k-1) degrees of
freedom at a given level of significance.
Example : A random sample of 200, all retired, were classified according to education and
number of children. The results are:
We first determine the expected frequencies. We use column and row totals. We obtain the
following:
15
18.68 39.84 24.49
17.55 37.44 23.01
8.78 18.72 11.51
n ij E ij 2
Observed Expected n ij E ij E ij
14 18.675 -4.68 1.17
19 17.55 1.45 0.12
12 8.775 3.23 1.19
37 39.84 -2.84 0.20
42 37.44 4.56 0.56
17 18.72 -1.72 0.16
32 24.485 7.52 2.31
17 23.01 -6.01 1.57
10 11.505 -1.51 0.20
2 = 7.46
At 5% level of significance and 4 degrees of freedom, the critical value is 9.488 hence there is no
sufficient evidence to show that family size is associated with level of education of the father.
3. Cramer Coefficient C
2
The Cramer Coefficient C is computed as C
NL 1
where 2 is the value of a chi-square statistic given above and L is the smallest between the
number of rows and the number of columns
To test the null hypothesis that the true association is zero, a chi-square statistic, 2
obtained is used. This statistic follows a chi-square distribution with
(r-1) x (k-1) degrees of freedom.
Decision Rule
Reject Ho and conclude that there exists a significant association between A and B if the
value of the test statistic is less than the critical value obtained from the table of critical values of
a chi-square distribution at specified level of significance.
16
Moderate 10 10 8 28
Authoritarian 15 11 5 31
Total 32 30 27 89
To compute the value of the index we need to solve for the chi square statistics. So we
determine expected frequencies first
10.79 10.11 9.10
Expected Frequencies (E) 10.07 9.44 8.49
11.15 10.45 9.40
We may construct the following table:
n ij E ij 2
Observed Expected n ij E ij E ij
7 10.79 -3.79 1.33
10 10.07 -0.07 0.00
15 11.15 3.85 1.33
9 10.11 -1.11 0.12
10 9.44 0.56 0.03
11 10.45 0.55 0.03
14 9.10 4.90 2.64
8 8.49 -0.49 0.03
5 9.40 -4.40 2.06
2 = 7.58
Note: r = k =3 so L=3
4. CORRELATION
A graph of the paired scores will enable the researcher to judge the nature of the correlation.
Such graphical presentation of scores on the two variables is called a scatterplot.
A correlation coefficient equal zero implies absence of linear relationship but not total
absence of relationship.
The Spearman coefficient determines the degree of correlation between two variables
measured in at least an ordinal scale.
17
Data Layout: Let (xi, yi) be the rank of the ith unit for variables X and Y
Variable X x1 x2 . . . . . . . xn
Variable Y y1 y2 . . . . . . . yn
6 x i y i 2
The Spearman correlation coefficient is rs 1
n n 2 1
Testing for Significance
To test the null hypothesis that the true correlation is zero, we compute the test statistic
rs n 2
t ; for sample size less than 30
1 r 2
Decision Rule
Reject the null hypothesis and conclude significant correlation when t or z is greater than
the critical value obtained from the t or z distribution at a specified level of significance. For the
critical value t the number of degrees of freedom is n-2.
Example : Suppose ten presidential candidates are ranked by two civic action groups as to
their platforms on the issue of unemployment. The data are given below.
Civic Candidate
Group A B C D E F G H I J
X 5 10 7 2 3 4 1 9 8 6
Y 3 9 8 1 2 5 4 10 7 6 Total
(xi - yi)2 4 1 1 1 1 1 9 1 1 0 22
6 x i y i 2 622
rs 1 1 1 0.133 0.867
2
n n 1 10100 1
rs n 2 0.867 10 2
and t 4.92
1 r 2 1 0.867 2
At 5% level and 8 degrees of freedom, the tabulated t-value is 1.860. The t- statistic obtained
exceeds the critical value. Hence the hypothesis is rejected and we conclude that there is a
significant correlation between the rankings of the two civic groups.
18
n x i yi x i yi
r
n x 2 x 2 n y 2 y 2
i i i i
Testing for Significance
To test the null hypothesis that the true correlation is zero, we compute the test statistic
r n 2
t ; for sample size less than 30
1 r2
Decision Rule
Reject the null hypothesis and conclude significant correlation when t or z statistic is
greater than the critical value obtained from the t or z distribution at a specified level of
significance. For the critical value t the number of degrees of freedom is n-2.
n x i yi x i yi 12(2864.25) (92.5)(3680)
r
n x 2 x 2 n y 2 y 2 12723.25 92.52 1214068 3682
i i i i
r = 0.163
r n 2 0.163 12 2
and t 0.522
1 r 2 1 0.1632
The t statistic obtained does not exceed the tabulated value at 5% level of significance and n-2 =
10 degrees of freedom. Hence the hypothesis is not rejected.
19
The point-biserial coefficient measures the association between a continuous variable X and
a dichotomous variable Y. Here, the dichotomous variable is coded as either 0 or 1. The
coefficient is computed as
To test the null hypothesis of no association, we compute for the test statistic
t = rpb (n-2)1/2
(1- rpb2)1/2
Decision Rule
Reject the null hypothesis and conclude significant correlation when t statistic is greater
than the tabulated t-value with n-2 degrees of freedom.
Example: Given a small class of seven students, four boys and three girls, determine if there
exists a relationship between math scores and sex based on the data below.
At 0.05 level the tabulated t-value with 5 degrees of freedom is 2.571. Hence the null
hypothesis is not rejected.
The Kendall’s coefficient of concordance measures the extent of agreement among several
judges in rating a set of n different objects. The ratings given by judges are usually of
ordinal scale.
20
Data Layout : Assume there are k=3 judges and n=6 objects
5 3 5 1 9 2.25
6 6 2 4 12 2.25
R i = 63 (Ri–R)2=63.5
R = 63/6 = 10.5
Null Hypothesis
There is no concordance among the judges, that is, the ratings given by the judges
have no agreement.
Test Statistic
2 = k(n-1)W which follows a chi-square distribution with n-1
degrees of freedom
Decision Rule
Reject the null hypothesis and conclude a significant concordance among the judges if
the chi-square statistic exceeds the tabulated value at a specified level of significance and n-1
degrees of freedom.
Example: (using the data set above in the data layout above)
12 R i R 2 1263.5
W 0.403
2 2
k n n 1 9( 6)(36 1)
and 2 = k(n-1)W = 3(5)(0.403) = 6.045.
For level of significance say = 0.05 (5%), with n-1 = 5 degrees of freedom, the value obtained
from the table is 11.07. The test statistic is less than the critical value so the hypothesis is not
rejected. Hence there is no significant concordance among the judges.
2. In how many ways can a judge award first, second, and third places in a contest with eight
entries?
21
3. Ten people are to sit at a round table. Find the number of seating arrangements if the host and
the hostess must always sit together.
4. Six married couples have bought 12 seats in a row for a premier show of Spiderman III. In
how many ways can they be seated if
a) each couple is to sit together? b) all the men is to sit at the right side of all the women?
5. How many ways are there to hire 3 applicants from 5 equally qualified recent graduates for a
teaching job in a certain state university?
7. Five different mathematics books, four different physics books and two different chemistry
books are to be arranged on a shelf so that the mathematics books stand together, physics books
stand together, and chemistry books stand together. How many such arrangements are possible?
8. From 4 mathematicians and 3 physicists find the number of committees that can be formed
consisting of 2 mathematicians and 1 physicist.
9. If a letter is chosen at random from the English alphabet, find the probability that the letter is a
consonant.
11. A card is drawn from a deck of 52 cards. Find the probability that it will be a heart or a king.
12. A call center urgently needs four newly graduates of business courses. These 4 applicants are
selected at random from 5 accountancy graduates, 3 economics graduates and 1 management
graduate. What is the probability that a management graduate will be hired?
13. A coin and a die are tossed, what is the probability of obtaining a tail and at least a 4?
14. From 6 male and 4 female examinees, a committee of 3 persons is selected at random, what
is the probability of having 2 male and 1 female in a committee?
15. What is the probability of randomly selecting the letter M in the word COLUMN?
16. A bag contains 8 red marbles, 6 blue marbles, and 10 white marbles. A marble is drawn from
the box. What is the probability that the marble is?
a. is red? b. is not blue? c. is blue or red?
17. If a letter is chosen at random from the word PARKING; find the probability that it is a
a. vowel? b. consonant?
18. Choose a number at random from 1 to 5. What is the probability of each outcome? What is
the probability that the number chosen is even? What is the probability that the number chosen is
odd?
19. If a card is drawn from a deck of cards, what is the probability that it is
a. a spade? b. a heart or diamond?
20. If a pair of dice is tossed, what is the probability that the sum is
a. 10? b. 6 or 11?
22
21. The probability that a student will go to the library is 0.35, the probability that he will attend
his class is 0.75, and the probability that he will go to the library and attends his class is 0.68.
What is the probability that he will
a. not go to the library? b. either go to the library or attends his class?
22. The probability that when a student visits his girlfriend, he will bring chocolates is 0.6, and
the probability that he will bring flowers is 0.7. If the probability that he will bring either
chocolates or flowers is 0.5, what is the probability that he will
a. bring both chocolates and flowers? b. not bring either chocolates or flowers?
23. A jar contains black and white marbles. Two marbles are chosen without replacement. The
probability of selecting a black marble and then a white marble is 0.34, and the probability of
selecting a black marble on the first draw is 0.47. What is the probability of selecting a white
marble on the second draw, given that the first marble drawn was black?
24. The probability that it is Friday and that a student is absent is 0.03. Since there are 5 school
days in a week, the probability that it is Friday is 0.2. What is the probability that a student is
absent given that today is Friday?
25. At Kennedy Middle School, the probability that a student takes Technology and Spanish is
0.087. The probability that a student takes Technology is 0.68. What is the probability that a
student takes Spanish given that the student is taking Technology?
SAMPLE PROBLEMS
Directions: Read each question below. Encircle the letter of your answer.
3. Which of the following experiments does NOT have equally likely outcomes?
6. In New York State, 48% of all teenagers own a skateboard and 39% of all teenagers own a
skateboard and roller blades. What is the probability that a teenager owns roller blades given that
the teenager owns a skateboard?
8. In the United States, 56% of all children get an allowance and 41% of all children get an
allowance and do household chores. What is the probability that a child does household chores
given that the child gets an allowance?
9. In Europe, 88% of all households have a television. 51% of all households have a television
and a VCR. What is the probability that a household has a VCR given that it has a television?
10. In New England, 84% of the houses have a garage and 65% of the houses have a garage and
a back yard. What is the probability that a house has a backyard given that it has a garage?
11. Which of the following is the sample space when 2 coins are tossed?
12. At Kennedy Middle School, 3 out of 5 students make honor roll. What is the probability that
a student does not make honor roll?
13. A large basket of fruit contains 3 oranges, 2 apples and 5 bananas. If a piece of fruit is chosen
at random, what is the probability of getting an orange or a banana?
15. In a class of 30 students, there are 17 girls and 13 boys. Five are A students, and three of
these students are girls. If a student is chosen at random, what is the probability of choosing a
girl or an A student?
16. In the United States, 43% of people wear a seat belt while driving. If two people are chosen
at random, what is the probability that both of them wear a seat belt?
17. Three cards are chosen at random from a deck without replacement. What is the probability
of getting a jack, a ten and a nine in order?
a) 8/16,575 b)1/2197 c) 6/35152 d) none of the above
24
18. A city survey found that 47% of teenagers have a part time job. The same survey found that
78% plan to attend college. If a teenager is chosen at random, what is the probability that the
teenager has a part time job and plans to attend college?
19. In a school, 14% of students take drama and computer classes, and 67% take drama class.
What is the probability that a student takes computer class given that the student takes drama
class?
20. In a shipment of 100 televisions, 6 are defective. If a person buys two televisions from that
shipment, what is the probability that both are defective?
21. It is the graphical representation that is very useful for illustrating set operations.
22. A committee of 5 pupils is to be selected randomly from a group of 5 male and 10 female
pupils. Find the probability that the committee will consist of 2 male and 3 female pupils.
23. During 4 successive years, Mr. Santos purchased oil for his furnace at respective costs of
1.83, 1.92, 1.25 and 1.45 cents per gallon. What was the average cost of oil over the 4-year
period?
25. Which of the following is not correct? A frequency distribution can be presented as
26. The kurtosis of a symmetrical curve is 2.56. The curve therefore is:
28. The following figures represent the ages of the members of De Guzman family. What is the
average age for the family?
Father Mother Maggie Brian
45 yrs 41 yrs and 7 mos 14 yrs and 3 mos 12yrs and 6 mos
a) 28 yrs and 4 mos b) 30 yrs c) 28 yrs and 6 mos d) 35 yrs and 4 mos
25
29. You have calculated the correlation coefficients between two variables to be -.95. This would
indicate
30. In a certain city, the distribution of the 1st and 2nd-born children of 2 children families by
gender is shown below
Second born
First born Female Male Total
What is the probability that a family with 2 children selected at random in this city has
children of the same gender?
26
STATISTICS AND PROBABILITY
1. The score of 8 students were 75, 78, 73, 82, 87, 89, 93, 95. What was the average score of
the students?
a. 82 b. 83 c. 84 d. 44
3. Out of 100 numbers. Sixteen were 5’s, twenty-one were 6’s, thirty were 7’s and the rest
were 8’s. Find the arithmetic mean of the numbers.
a. 6.5 b. 6.8 c. 7.0 d. 7.1
4. Thirty students in a class averaged 80% on a certain exam. Twenty others averaged 90%.
What is the class average?
a. 84 b. 85 c. 86 d. 87
5. if the average annual income of 15 workers is P66,000 and six of the workers made
P30,000 of the year, what is the average annual income of the remaining 9 workers?
a. P66,000 b. P50,000 c. P30,000 d. P90,000
6. Anna has an average of 87% on 5 exams in Statistics. What must she get in the sixth
exam to average 88% on the six exams?
a. 87 b. 88 c. 93 d. 95
7. The average of 15 numbers is 7. Angelo is adding these numbers and mistakenly reads 6
of the numbers as 9 instead of 7, what average will he get?
a. 7.6 b. 7.8 c. 8 d. 8.2
9. Two coins are tossed. How many possible outcomes are there?
a. 2 b. 4 c. 8 d. 16
10. Two dice are tossed. How many possible outcomes are there?
a. 12 b. 24 c. 36 d. 42
11. How many numbers of 4 different digits each greater than 5000 can be formed from the
digits 1, 2, 3, 4, 5, 6, 7?
a. 30 b. 60 c. 90 d. 360
12. How many 2-digit numbers of two different digits can be formed from the numbers 0, 2,
4, 6, 8?
a. 12 b. 16 c. 20 d. 25
13. A committee of 7 is to be selected from 8 seniors and juniors. In how many ways can this
be done if the committee must be composed of 4 seniors?
a. 8C4 • 5C4 b. 8C7 • 5C0 c. 8C4 • 5C3 d. 5C4 • 8C3
14. In how many ways can the judges in the Bb. Pilipinas pageant chooses the Philippine
representatives to the Miss Universe and Miss World beauty contests from among 5
finalists?
a. 5 b. 10 c. 20 d. 25
15. In how many ways can the first, second and third places be choose from a group of 10
contestants?
a. 27 b. 30 c. 720 d. 1000
27
16. In how many ways can 4 persons be seated in a round table?
a. 4 b. 6 c. 24 d. 30
17. A multiple choice test consist of 5 questions with 3 possible answers but only one of
which is correct. In how many ways can a student answer the 5 questions and get them all
correct?
a. 1 b. 2 c. 5 d. 15
18. How many straight lines can be drawn, given 5 points no three of which are collinear?
a. 5 b. 10 c. 15 d. 20
20. A group of 3 boys and 2 girls are seated in a row of 5 chairs. Find the probability that will
be seated alternately?
a. 3!2! b. 2 c. 3 d. 1
5! 5 5 5
21. Find the probability of getting a head or a tail in a single toss of a coin?
a. 1 b. 1 c. 1 d. 1
4 2 8
22. A pair of dice is tossed. If one die shows a 5, what is the probability that the other die
shows a
a. 1 b. 1 c. 1 d. 1
36 6 11 12
23. A card is drawn from a deck. What is the probability that the card drawn is an ace?
a. 2 b. 1 c. 1 d. 5
13 2 13 26
24. One card is drawn at random from 100 cards numbered 1 to 100. What is the probability that
the number on the card is divisible by 5?
a. 1 b. 1 c. 1 d. 1
2 3 4 5
25. A basket of 20 apples, three of which are rotten. If an apple is selected, what is the
probability that it is good?
a. 17 b. 3 c. 17 d. 3
20 20 23 23
26. A card is drawn from a well shuffled deck of 52 cards. What is the probability that a card
drawn is a face card?
a. 1 b. 3 c. 3 d. 1
4 13 52 2
27. A box contains 5 red balls, 3 green balls and 4 blue balls. If two balls are drawn in succession
without replacement. What is the probability that both are red?
a. 5 3 c. 5 4
12 11 12 11
b. 5 5 d. 4 3
12 12 12 11
28. A student has 4 different books. In how many ways can he arrange them in a bookshelf?
28
a. 4 b. 8 c. 12 d. 24
29. In how many ways can 3 boys and 4 girls sit in a row of 7 chairs if the boys and the girls
alternate?
a. 3! 4! b. 2! c. 2! 3! 4! d. 8!
30. Eight children are to be seated in a round table. In how many ways can they be seated?
a. 5! b. 6! c. 7! d. 8!
32. There are 100 envelopes in a box. Of these, 40 contain P50, 30 contain P100, 20 contain P500
and 10 contain P1000. If one draws an envelope at random from the box, what is his expectations?
a. P16 b. P412 c. P200 d. P250
10
33. Find the value of i2 .
i 1
a. 1 b. 10 c. 11 d. 55
4
34. Find the value of i2 .
i 1
a. 1 b. 4 c. 17 d. 30
10
35. If xi 34, x1 = 3, x10 = 7.
i 1
9
Find xi ?
i 1
a. 31 b. 24 c. 28 d. can’t be determined
37. A pair of dice are rolled. Find the probability that the total on the two dice is not 8.
a. 31 b. 5 c. 29 d. 7
36 36 36 36
38. A group of six members gathered for special meeting. Each member has to shake hands with all the
other members. Find the total number of handshakes made?
a. 6 b. 12 c. 10 d. 15
39. From five married couples, 4 people are selected at random. Find the probability of selecting one
woman and 3 men.
29
P 5 P3 C1 5 C 3 10C 4
a. 10
b. 10! c. 5
d.
10
P4 5!5! 10
C4 10!
40. In how many ways can 10 different marbles divided among Anton, Bobby and Charles so that 5 are
given to Anton, 3 to Bobby and 2 to Charles.
Math only 7
Science only 9
History only 14
Math and Science only 8
Math and History only 3
Science and History only 26
All subjects 14
41. How many ways are not taking any of the subjects?
a. 14 b. 19 c. 26 d. 31
44. A player rolls two dice. He wins if and only if the first die shows an even number or if the two dice
show a sum of 9. Find his probability of winning?
a. 1 b. 2 c. 5 d. 1
2 3 9 3
45. In a board exam, the probability that an examinee will pass each of the three subjects is 0.60. What is
the probability that an examinee will pass at most three subjects?
a. 0.064 b. 0.216 c. 0.784 d. 0.936
46. Find the sum of ways of two 1-peso coin and eight 5-peso coins can be given to street children if
each child get a coins.
a. 15P15 b. 15P15 c. 15! d. 15!
2!5!8! 2!8!
48. A box contains 5 red, 6 white and 5 blue balls. Two balls are chosen at random. What is the
probability that they are both white?
a. 1 b. 3 c. 5 d. 16
8 11 11 11
30
49. Twenty-one tickets numbered from 1 to 21 are in box. If two tickets are drawn at random, determine
the probability that both are odd?
a. 11 b. 121 c. 16 d. 10
42 441 49 21
50. Seventeen tickets numbered from 1 to 17 are in box. If two tickets are drawn at random, determine
the probability that the first one is odd and the second one is given.
a. 1 b. 9 c. 1 d. 81
34 4 289
24. A student received a grade of 84 on a final exam in Math for which the average grade was 76
and the standard deviation is 10. On the final exam in Physics, for which the mean grade was 82
and the standard deviation was 16, she received a grade of 90. In which subject was her relative
standing higher?
7. The number 911 emergency calls classified as domestic disturbance calls in a large
metropolitan location were sample for 30 randomly selected 24-hour period with the following
result:
25 46 34 45 37 36 40 30 29 37
44 56 50 47 23 40 30 27 38 47
58 22 29 56 46 46 38 19 49 50
8. Out of 100 numbers, 20 were 4’s, 40 were 5’s and 30 were 6’s. The remainder were 7’s. Find
the average value of the numbers.
9. Over the years, it is observed that the total number of samples of covered in the annual survey
of the Philippine Business Industry, 70% are small (with employment size of at most 9 workers).
It is further noted that 90% of small establishments submit their report while 70% of the large
ones do not submit. What is the probability that a sample for the survey will submit its report?
10. A fair die is tossed twice. Find the probability of getting a 4, 5 and 6 on the first toss and a 1,
2, 3, or 4 on the second toss.
11. It is required to seat 5 men and 4 women in a row so that the women occupy the even places.
How many such arrangements are possible?
12. In a given business venture, a lady can make a profit of P300 with probability 0.6 or take a
loss of P100 with probability 0.4. Determine her expectation.
14. The probability that an entering college student will graduate is 0.4. Determine that out of 5
students, all will graduate.
15. If the probability of a defective bolt is 0.1, find the (a) mean and the (b) standard deviation
for the distribution of defective bolts in a total of 400.
Round 1
1. What is the probability of randomly selecting the letter T in the word COMPUTE?
4. Ben will be given 4 chances to pick up toys at random from a gift bag containing Pinoy
superhero toys: 2 Captain Barbel, 3 Darna and 2 Panday. How many possible collections
of toys can he pull out?
5. Standard deviation of scores in a statistics exam is 10. However, since the exam was
very difficult, the statistics teacher later decided to give a bonus of 5 points to all the
students. What is the standard deviation of the new scores?
a. 10/25 b. 10 c. 15 d. 25
6. In a student body election, where 1800 students voted, the votes for Vi, Guy and Glo are
in the ratio of 4:3:2. If they were the only candidates and none of the 1800 students cast
more than 1 vote, how many vote did Guy received?
8. In clinical trials 0f a new skin astringent, 100 women experienced 1st degree burns,
nausea or both. Of these, a total of 35 women experienced 1st degree burns and 25
experienced both burns and nausea. How many experienced nausea?
a. 25 b. 65 c. 90 d. cannot be determined
9. Which of the following statements is true about a truly normal frequency distribution?
10. The Philippine National Statistics Office reports the following median family income (in
thousand pesos) by income decile for the year 2003. An estimate of the 1st quartile is
DECILE 1st 2nd 3RD 4TH 5TH 6TH 7TH 8TH 9TH 10th
2nd Round
1. Fill in the missing words to this quote: “Statistical inference methods may be described
as methods for drawing conclusions about______ based on _______ computed from
the_______ “.
2. You have calculated the correlation coefficients between two variables to be -.95. This
would indicate
3. A candy company claims that 10% of its candies are blue. A random sample of 200 of
these candies is taken, and 16 are found to be blue. Which of the following tests would
be the most appropriate for establishing whether the candy company needs to change its
claim?
33
a. a marked pair t-test
b. one-sample proportion z-test
c. two-sample proportion z-test
d. chi-square test of association
4. Consider the set of test scores from the last year’s Phil. Statistics Quiz elimination round.
Supposing that we double all the scores, which of the following sample statistics will be
changed? The mean, the median, or the standard deviation
5. In a certain city, the distribution of the 1st and 2nd-born children of 2 children
families by gender is shown below
Second born
First born Female Male Total
What is the probability that a family with 2 children selected at random in this city has
children of the same gender?
6. The Phil. National Statistics office reports the following median family (in thousand
pesos) by income deciles for the year 2003.
DECILE 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
MEDIAN
INCOME
28 43 55 69 86 106 134 175 242 419
Approximately what percentage of the families have incomes between P55,000 and P
134,000?
7. Let us define a new statistics as the distribution between the 70th sample percentile and
30th sample percentile. This new statistic would give us information concerning
8. Workplace accidents are categorized in 3 groups: minor, moderate and severe. The
probability that a given accident is minor is 0.5, that it is moderate is 0.4, and that it is
severe is 0.1. Two accidents occur independently in one month. Calculate the probability
that neither accident is severe nor at most one is moderate.
Final Round
34
1. The central limit theorem is important in statistics because
2. The average time that it takes for a person to experience pain relief from aspirin is 25
minutes. A new ingredient is added to help speed up pain relief. Let U denote the
average time to obtain pain relief. A study is conducted to verify if the new product is
better. Which of the following is the most appropriate null and alternative hypotheses
for this experiment.
a. If the sample size is fixed, the confidence interval gets wider as we increase the
confidence the confidence coefficient.
b. If the population standard deviation increases, the confidence interval
decreases in width.
c. A confidence interval for a mean always contains the sample mean.
d. If the confidence coefficient is fixed, the confidence interval gets narrower as we
increase the sample size
5. You have measured the systolic blood pressure of a random sample of 25 employees of
NSO. A 95% confidence interval for the mean systolic blood pressure for the employees
is computed to be (122, 138). Which of the following statements gives a solid
interpretation of this interval?
a. About 95% of the sample of 25 employees has a systolic blood pressure between
122 and 138.
b. About 25% of all NSO employees has a systolic blood pressure between 122 and
138.
c. If the sampling procedure were repeated many times, the approximate 95%
of the resulting confidence interval would contain the mean systolic blood
pressure of all NSO employees
d. The probability that the sample mean falls between 122 and 138 is equal to 0.95
6. Ben and Eddie were enrolled in different sections of the same statistics course. Both of
them scored 80% in the finals. However, Eddie’s teacher said that he is in the 60th
35
percentile in his section while Ben is in the 80th percentile. Which of the following is
correct?
a. Ben’s section generally performed better on the test than Eddie’s section.
b. A person at the 30th percentile in Eddie’s class will be at the 40th percentile in
Ben’s section.
c. A person at the 30th percentile in Ben’s section will be at the 40th percentile in
Eddie’s section.
d. Individuals in Ben’s section generally scored lower on the test than those in
Eddie’s section.
7. The sample mean is an unbiased estimator for the population mean. This means that
8. Some descriptive statistics for a set of test scores that follow a normal distribution are
shown in the table below. For this test, a certain student has a standardized score of Z= -
1.2. What is the new score of this student?
Variable N Mean
score 50 1045.7
Variable Minimum Maximum
score 628.9 1577.1
9. Suppose the examination grades of a large group of statistics students are approximately
normally distributed with mean 70 and standard deviation 10. The instructor wishes to
award quality point grades1.0, 1.5, 2.0, 2.5 and 3.0 so that that the top 10% of the
students would get 1.0’s. What numerical grade should divide the 1.0’s and the 1.5’s?
( Z – table needed)
1. What law in statistics states that if an experiment is repeated a large number of times, the
relative frequency of an outcome approach the probability of the outcome?
a. Das Kapital
b. Domesday Book
36
c. Principia Mathematica
d. None of these
a. Genetics
b. Taxonomy
c. Biometrics
d. None of these
4. There are 15 points on a plane. Given that no three points are collinear, how many
triangles can be formed from the plane?
a. 210
b. 105
c. 455
d. 273
6. The MMDA reports that in the rainy season, there is an increase in the number of
recorded road accidents especially in Quezon City and Caloocan City. Which of the
following is the best conclusion about this report?
a. That the temperature of the rainy season tends to disrupt the psycho-motor control
of the drivers in Quezon City and Caloocan City.
b. That roads become more slippery, causing the increase of accidents.
c. That trailer trucks on those cities tend not to see signs when the rainy seasons
come.
d. That MMDA officers are much more efficient during the sunny days.
7. In a round table, 7 people are seated. If two people refuse to be replaced on their initial
seats, in how many ways can the round table be set-up?
a. 720
b. 120
c. 24
d. None of these
a. bar graph
b. frequency polygon
c. ogive
d. density graph
9. How many three-digit numbers which are divisible by 5 can be formed from the digits
from 0 to 9 if no repetitions are allowed?
a. 162
b. 810
37
c. 180
d. 200
10. How many three-digit numbers greater than 800 can be formed from the digits from 0 to
9, if a digit may be repeated?
a. 81
b. 162
c. 200
d. 100
11. On the average, the number of days that an institution is closing because of extreme rains
is 5. What is the probability that this institution would close for 4 days?
a. 0.1462
b. 0.1606
c. 0.1456
d. 0.1342
12. At what point on the plot of the cumulative frequency must the two ogives intersect?
13. Which of the following is true about the given table below?
14. The kurtosis of a symmetrical curve is 2.43. The curve therefore is:
a. platykurtic
b. mesokurtic
c. leptokurtic
d. normal
15. In how many ways can the valedictorian, salutatorian and first honorable mention be
chosen from a senior high school class of 36 students?
a. 1332
b. 630
c. 46620
d. 7140
38
16. Ars Conjectandi (The Art of Guessing), a book published in 1713, was a book that gave
interest on probabilities concerning gambling, economics and politics. Who wrote this
book?
a. Gregor Mendel
b. Wilhelm Leibniz
c. Jacob Bernoulli
d. Karl Marx
17. If 5 cards are dealt from a standard deck of 52 playing cards, what is the probability that
3 will be spades?
a. 0.0815
b. 0.0961
c. 0.0588
d. 0.7154
18. Given is a table about the contestants of the 2004 Philippine Statistics Quiz. Assuming
that all contestants are equally intelligent, what is the probability that the PSQ champion
is a male and is an engineering student?
19. There are 15 balls in a box. 3 are red, 5 are blue and 7 are yellow. If 6 balls are to be
taken from the box, what is the probability that 1 is red, 2 are blue and 3 are yellow?
a. 0.0316
b. 0.2879
c. 0.2098
39
d. 0.3122
20. The Philippine export and import costs in the year 2000 are estimated to be P
12,243,000.00 and P 10,106,000.00 respectively. This data implies that
21. What is the Pearsonian coefficient of skewness of a data from a population whose mean,
median and standard deviation are 23, 22, and 0.9 respectively?
a. 3.45
b. 3.33
c. 0.03
d. 2.89
22. If in a certain line, the letters A, B, …, I and J are placed in its alphabetical order, in how
many ways can this line be named?
a. 24
b. 120
c. 45
d. 10
23. The scores of a student in a series of five 20 item tests are: 12, 18, 15, 15 and 17.
Accidentally, the instructor replaced the 17 with a 16. By how much will the standard
deviation of the student’s score change after the replacement?
a. 0.18
b. 0.12
c. 0.13
d. 0.15
Trial Question
At a pre-school class in Sto. Niño Parochial School, the top three students are considered as
honor students. If the class is composed of 15 students, what portion of the class will not make it
to the honor roll?
a. 60 % b. 40% c. 80% d. 20%
ROUND 1
2. (5 sec) The Bureau of Agricultural Statistics collects data on retail prices of selected
agricultural commodities in major trading centers and reports these prices in the BAS media
service. The report includes the prevailing price, lowest price and highest price. Which of the
following gives a quick measure of the variability of the price data?
a. variance b. range c. fractile d. mean
3. (5 sec) In the memorandum of the secretary of the Department of Agriculture regarding the
weekly updates on prices of palay, rice, and corn, Bureau of Agricultural Statistics reports the
prevailing prices as the modal price of each commodities. If the prices of well-milled rice per
40
kilogram are P22, P25, P24, P22, P23, and P22, what will be reported as the prevailing price of
well-milled rice?
a. P25 b. P23 c. P24 d. P22
4. (10 sec) If the mean absolute deviation of a data set is zero, what does this suggest?
a. All observations are necessarily equal
b. All observations are necessarily equal to 1.
c. All observations are necessarily equal to 0.
d. There is an equal number of positive and negative observations.
5. (30 sec w/ material) The Speedo Wheel Company makes custom wheels for Toyota. The
wheel sizes tend to follow a normal distribution. The mean wheel width is 6.05 inches and the
standard deviation is 0.05 inch. In order for the wheels to fit properly, the diameter of the wheels
must be between 5.90 and 6.10 inches. What is the probability that a randomly chosen wheel
manufactured by this company will fit properly?
a 68% b. 84% c. 90% d. 95%
6. (40 sec w/ material) The following are monthly salaries of a group of call center employees in
Manila. (Refer to the given table.) What best describes the mean salary of a call center agent?
a. Mean salary is in the modal class.
b. Mean salary is in the median class.
c. Mean salary is in the class interval with 100 employees.
d. Mean salary is in the class interval with 50 employees.
7. (5 sec) Suppose a person throws a dart at a circular board having a radius of one foot. We
assume that he is bound to hit somewhere on the board. Let x denote the distance (in inches) by
which he misses the center of the board. Describe the random variable X.
8. (30 sec) The committee system is the core of congress law-making, investigative and oversight
function. This is so because much of the business of congress is done in the committee. The
Philippine Senate is composed of 24 senators who make up the membership of the senate
permanent committees. Suppose 9 senators were to be picked from 12 opposition senators and
nine administration senators to form a committee, find the probability that 5 opposition senators
will be on the committee.
9. (10 sec w/ data) Jean was a census enumerator in the 2007 Census of Population conducted by
the NSO in August. In one of her interviews with the households in the barangay in Manila, she
was able to collect the following variables
Table
Jean coded the variables’ relationship to head, sex and highest grade completed with numerical
figures to facilitate data processing. What type of variable do these coded data represents?
41
10. (5 sec) The birth certificate of a child is a proof of recognition of a new born individual’s
importance to the state and his status under the law. Which of the ff. set of information consist of
quantitative data?
A. Information about the child name, sex, place of birth and date of the birth
B. Information about the mother: total number of children born alive, number of
children still living including birth
C. Information about the child name, place of birth, weight at birth
D. Information about the father, name, height, occupation, age at the time of this birth
ROUND 2
1. (5 sec with material) After the International Astronomical Union (IAU) had declassified Pluto
as a dwarf planet in 2006, there remained only eight known planets based on the new definition.
The following table shows the distances of these different planets from the sun.
If you were to compute the mean and the median distances of the eight planets which of the
following statements would be true about the mean and the median?
A. The mean and the median are values that belong in the data set.
B. The mean and the median are values that do not belong in the data set.
C. The mean and the median are computed using all the values in the data set.
D. The mean and the median are computed using only the middle value.
2. (5 sec) If you are certain that 3 of the 5 choices to a particular question are wrong but had to
guess randomly between the remaining choices. What is the probability that you guess the right
answer?
3. (5 sec) If the particular male college students in a class got a percentile score of 35 in an
examination, this means that
4. (10 sec) Consider the following frequency distribution on the height of 200 college varsity
players. If the frequencies in the first and last class intervals are increased, which of the
following will result?
Class Interval Frequency
59-62 4
63-66 26
42
67-70 18
71-74 36
75-78 56
79-82 38
83-86 16
87-90 6
7. (40 sec) A rock concert producer has asked an outdoor performance for the benefit of cancer
patients of a certain hospital. He has invited 6 bands to perform at an outdoor stage in Manila.
The earnings will be used to purchase medicines, wheelchairs and other materials and equipment
of the hospital.
The producer expects to earn P 290,000 if the weather is fine on the schedule day. If it is
a rainy day he expects to earn only P 120,000 and if it is a stormy day he expects a P 270,000
loss. Based on the historical record at that time of the year, the weather office has estimated the
chance of a fine day to be 0.60, the chance of a rainy day to be 0.25 and the chance of a stormy
day to be 0.15.
If the producer decides to pursue the concert, will you agree with his decision?
8. (5 sec) One of the regular surveys of Bureau of Agricultural Statistics is the Agricultural
Labor Survey. In this survey, data are collected semi-annually for palay and corn and annually
for coconut and sugarcane. In conducting the coconut survey, each data collector seeks coconut
43
farmers who hired laborers during the reference period until he/she has interviewed ten farmers.
This type of sampling procedure is called:
9. (10 sec) Suppose a sample of 50 bus commuter travel to work at an average of 30 minutes a
day with a population standard deviation of 2.5. A 95% confidence interval for the true mean is
29.3 minutes to 3.07 minutes. The confidence interval of a population mean is obtained by x = z.
What would happen to the confidence interval at the same confidence level at 95% if the number
of sample of bus commuters is increased by 100?
10. (5 sec) There are several methods of measuring degree of association or relationship between
variables. Which of the following is not a tool used in analyzing relationship of variables?
A. Exponential distribution
B. Regression Analysis
C. Pearson’s coefficient of correlation
D. Scatter diagram
1. (15 sec) In a bus terminal, one passenger bus is scheduled to arrive every 4 hours on the tour.
In practice, however, the scheduled bus may not arrive on the exact hour, but it is equally likely
to arrive at any time during the next 60 minutes. Suppose you arrive at the terminal at 5 pm, what
is the probability that you have to wait for more that 10 minutes for the 5 pm bus?
2. (25 sec) Suppose is used to represent a couple, and a pictogram of the number of
children of these couples is shown below
Table
Number of children
44
If the data is to be represented in a pie chart, which of the following is best suited to illustrate the
above information?
3. (60 sec) Three doughnuts for Floyd’s merienda are to be selected at random from a box of a
dozen doughnuts that contains three Bavarians, four choco honey dip and five choco butternuts.
What is the probability that Floyd will get one choco butternut and two choco honey dip?
Manufacturer A claimed that their brand of 250 mg Glutathione skin whitening pills could
change the skin tone of a brown-skinned Filipina weighing 70 kilos or more in less than 7 weeks
on the average.
Due to its popularity, various manufacturers of Glutathione supplement compete in market today.
Manufacturers of 250 mg Glutathione skin whitening is concerned how sown the whitening
effect of the pills would be? Where will your critical region be located?
5. (60 sec) A barangay official hypothesized that the proportion of the residents who favored the
construction of public utility terminal in their subdivision is 75%. A survey was then conducted
in the subdivision with 600 residents to elicit their responses. Out of 100 respondents, 80% were
in favor of having a jeepney terminal within this subdivision. At 95% confidence interval, is
there a reason to accept the barangay officials hypothesis?
p
p(1 p) N n
.8(.2) 600 100
n 1 N 1 99 600 1
A. Yes, because the barangay officials hypothesized proportion lies within 95%
confidence interval
B. Yes, because the barangay officials hypothesized proportion is 75% and is close to 80%
C. Yes, because the barangay officials should know his business
D. No, because 75% is not equal to 80%
6. (40 sec) XYZ, a leading national TV broadcasting Co. claims to have 60% share of household
viewers of their daily routine show. Suppose ZTE, a rival broadcasting company wants to verify
the claim of conducting a survey of 20 households which begins at the time of the noontime
show.
If the claim of XYZ broadcasting company is true, what is the probability that exactly 10
households are tuned in to their noontime show?
45
A. 0.12 B. 0.18 C. 0.59 D. 0.87
7. (5 sec) Base of the study of livestock and Poultry Division of Bureau Agricultural, a
regression analysis was used to determine if the demand for broiler in metric tons (Y) can be
explained by the following variables. Retail price of chicken (X1), retail price of beef (X2), and
the personal consumption expenditure (X3), all in pesos
Yt 0 1X1 2 X 2 3 X 3 e
If the retail price of chicken increased by P2.00 and all other variable are held constant, what is
the change in the broiler demand?
8. (20 sec) The 2003 Functional Education and Mass Media Survey conducted by NSO aims to
measure among other simple or basic and the functional literacy rate of the population.
Simple or basic literacy rate is defined as the proportion of the population aged 10 and over who
can read and write with understanding the language or dialect, “Functional Literacy,” on the
other hand, refers to the proportion of the population aged 10 to 64 yrs old who have reading,
writing, and numeric skills.
Using the table below, what can be said about the results of the 2003 FLEMMS in the National
Capital Region (NCR)?
46
9. (60 sec) If the quality grade-point averages of a random sample of college seniors are
normally distributed with a mean and standard deviation of 0.3, how large a sample is
required if we want to be 95% confident that our estimate of will not differ by more than 0.05?
10. (40 sec) A researcher assigned a number from 1 to 500 to his population of college students.
He needs a sample of 20 students for his study. If he uses systematic sampling, who will be the
10th member of his sample given that the first member is the 25th student in the population?
47
ADDITIONAL TOPICS
Blaise Pascal and Pierre de Fermat, French mathematicians, through correspondence in 1654
on a problem in gambling, began the mathematical study of probability.
Carl Friedrich Gauss, German mathematician, noted for his wide-ranging contributions to
physics, particularly the study of electromagnetism and in probability, he developed the
important method of least squares and the fundamental laws of probability distributions. The
normal probability graph is still called the Gaussian curve.
Bayes’ Theorem - a theorem that relates the probability of particular events taking place to the
probability that events conditional upon them have occurred.
Karl Pearson, British mathematician and philosopher of science, who is best known for
developing some of the central techniques of modern statistics, and for applying these techniques
to the problem of biological inheritance or heredity. Pearson's research laid much of the
foundation for 20th-century statistics, defining the meanings of correlation, regression analysis,
and standard deviation
Francis Galton, British scientist, attempted to find statistical relationships to explain how
biological characteristics were passed down through succeeding generations.
The Roman Empire was the first government to gather extensive data about the population,
area, and wealth of the territories that it controlled.
Census is a term usually referring to an official count by a national government of its country’s
population. A population census determines the size of a country’s population and the
characteristics of its people, such as their age, sex, ethnic background, marital status, and income.
National governments also conduct other types of censuses, particularly of economic activity. An
economic census collects information on the number and characteristics of farms, factories,
mines, or businesses.
1. the population census is used to determine the number of representatives each area within
the country is legally entitled to elect to the national legislature.
2. it provides that seats in the House of Representatives should be apportioned to the states
according to the number of their inhabitants.
3. to determine how many seats each state should have in the House and in the electoral
college, the body that nominally elects the president and vice president of the United
States. This process is known as reapportionment.
4. use population census figures as a basis for allocating delegates to the state legislatures
and for redrawing district boundaries for seats in the House, in state legislatures, and in
local legislative districts.
48
5. census population data are similarly used to apportion seats among the provinces and
territories in the House of Commons and to draw electoral districts.
6. population census finds information of great value in planning public services because
the census tells how many people of each age live in different areas. These governments
use census data to determine how many children an educational system must serve, to
allocate funds for public buildings such as schools and libraries, and to plan public
transportation systems. They can also determine the best locations for new roads, bridges,
police departments, fire departments, and services for the elderly.
Besides governments, many others use census data. Private businesses analyze population and
economic census data to determine where to locate new factories, shopping malls, or banks; to
decide where to advertise particular products; or to compare their own production or sales
against the rest of their industry. Community organizations use census information to develop
social service programs and child-care centers. Censuses make a huge variety of general
statistical information about society available to researchers, journalists, educators, and the
general public.
COMPONENTS
A . Market Basket
The CPI market basket contains a sample of goods and services commonly purchased by a group
of households in a particular area. The 1988-based CPI series has 13 regional market baskets
which are the combined market baskets of the bottom-30% and upper-70% income groups for
each region.
Computation of the provincial CPI uses the market basket for the region where the province
belongs.
The number of items comprising the market basket for all-income group for each region is
shown below:
B. Weighting System
Weights used in the current CPI series were derived from the results of the 1988 Family Income
and Expenditures Survey (FIES). The weight is computed as the proportion of expenditure on a
specific group of items to total expenditure.
Aggregated weights for the six item major groups are shown below:
C. Base Period
The CPI series constructed by NSO since 1945 has undergone several revisions. The 1988-based
CPI, the current series, is the sixth rebasing.
D. Index Formula
The construction of the CPI basically uses a Laspeyres Formula (fixed base year weights).
The formula is modified as the weighted arithmetic mean of price relatives. That is,
Sum (Pn / P )(P *Q )
Index
0 0 0
Sum P *Q
0 0
Pn = current price
Po = base year price or base price
Po*Qo = base year weight
E. Sample Outlets
Sample outlets are establishments where prices of sample commodities are quoted. There are
about 9,000 outlets nationwide.
The selected outlets are permanent sources of price data that cannot be changed at will unless
necessary because of the following reasons:
1. Closing of business
2. Disappearance of item from the stock for more than three consecutive months or permanently
An outlet may be completely abandoned or partly only, i.e., one or more items in the survey list
disappeared from its stock. It is replaced with the nearest retail outlet, that is, within the vicinity
of the replaced outlet. The choice of which outlet to choose is left to the discretion of the price
canvasser using the criteria for regular outlet selection. Once a substitute outlet has been selected,
the outlet becomes a permanent outlet for the succeeding survey rounds.
DEFINITION OF TERMS
50
Consumer price index (CPI) is a measure of change in the average retail prices of goods
and services commonly purchased by a particular group of people in a particular area.
B. Market Basket
Market basket refers to a sample of goods and services used to represent all goods and
services bought by a particular group of consumers in a particualr area.
C. Base Period
Base period, usually a year, is the reference period of the index number. It is the period at
which the index is set to 100.
D. Sample Outlets
Sample outlets are outlets or establishments where prices of sample commodities are
quoted.
E. Weight
F. Inflation Rate
Inflation rate (IR) is the annual rate of change or year-on-year change in CPI. That is,
CPIn - CPIo
Inflation Rate (IR) = ------------------------------- x 100
CPIo
where:
Purchasing Power of the Peso (PPP) shows how much the peso in the base period is
worth in another period. It gives an indication of the real value of the peso in a given
period relative to the peso value in the base period.
Wholesale price index (WPI) measures the monthly changes in the general price level of
commodities (usually in large quantities) that flow into the wholesale trading system.
The 1978 - based WPI series has a total of 376 commodities or items traded in the
wholesale market. These items include producer's materials, consumer goods and capital
goods which may either be raw materials, intermediate products or finished goods.
Moreover, they may also be domestically produced (including exports) or imported for
resale. These items are grouped according to the Philippine Standard Commodity
Classification (PSCC).
51
The weights of the current WPI utilizes the value of sales of commodities traded in the
wholesale market in 1978 as derived from the 1974 Input-Output tables. It covers only
the National Capital Region or Metro Manila. The weighted average of relatives method,
basically the Laspeyre's formula, is used in the construction of WPI.
Retail price index (RPI) is a measure of the changes in the retail price at which retailers
dispose of their goods to consumers or end-users.
The current RPI still uses 1978 as the base year and covers only the National Capital
Region (NCR) or Metro Manila.
While the 1972-based series was computed using the geometric mean without any
weighting pattern, the present series is constructed using the weights based on the 1974
Input-Output tables on the values of expenditures of goods and services of consumers
from the retail sector, estimated at 1978 prices. The weighted average of relatives method,
basically the Laspeyre's formula, is used in the computation of the index.
The present market basket has a total of 479 commodities grouped according to the Philippine
Standard Commodity Classification (PSCC).
Agricultural production – the growing field crops, fruits, nuts, seeds, tree nurseries (except
those of forest trees), bulb vegetables and flowers, both in the open and under glass; and
the production of coffee, tea, cocoa, rubber; and the production of livestock and livestock
products, honey rabbits, fur-bearing animals, silkworm, cocoons, etc. Forestry and fishery
production carried on as an ancillary activity on an agricultural holding is also considered
as agricultural production.
Constant Prices (at constant prices) – valuation of transactions, wherein the influence of price
changes from the base year to the current year has been removed.
Gross Domestic Product – the value of all goods and services produced domestically; the sum
of gross value added of all resident institutional units engaged in production (plus any
taxes, and minus any subsidies, on products not included in the values of their outputs).
Gross Regional Domestic Product - aggregate of the gross value added or income from each
industry or economic activity of the regional economy.
Gross National Product – the Gross Domestic Product adjusted with the net factor income from
the rest of the world. It refers to the aggregate earnings of the factors of production
(nationals) plus indirect taxes (net) and capital consumption allowance.
Gross Value Added – the difference between gross output and intermediate inputs. Gross
outputs of a production unit during a given period is equal to the gross value of the goods
and services produced during the period and recorded at the moment they are produced,
regardless of whether or not there is a change of ownership. Intermediate inputs refer to
the value of goods and services used in the production process during the accounting
period.
Basic or Simple Literacy - the ability to read and write with understanding simple messages in
any language or dialect
52
Functional Literacy – represents a significantly higher level literacy which includes not only
reading and writing skills but also numeracy skills. The skills must be sufficiently
advanced to enable the individual to participate fully and effectively in activities
commonly occurring in his life situation that require a reasonable capability beyond oral
and written communication.
Base Period - usually a year, is the reference period of the index number. It is the period at
which the index is set to 100.
Consumer Price Index (CPI) - measure of the average changes in the prices of a fixed basket of
goods and services usually purchased by households for their consumption.
Wholesale Price Index (WPI) - measure of the changes in the price level of commodities that
flow into the wholesale trade intermediaries.
Retail Price Index (RPI) - measure of the changes of the prices at which retailers dispose of
their goods to consumers and end-users.
Family expenditures - refer to the expenses or disbursements made by the family purely for
personal consumption during the calendar year 1997. They exclude all expenses in
relation to farm or business operation, investment ventures, purchase of real property and
other disbursements which do not involve personal consumption. Income from other
sources - include imputed rental values of owner-occupied dwelling units, interests,
rentals including landowner's share of agricultural products, pensions, support and the
value of food and non-food items received as gifts by the family (as well as the imputed
value of services rendered free of charge to the family).
Per capita income - is obtained by dividing the total family income by the total number of
family members.
Primary income - includes salaries and wages, commissions, tips, bonuses, family and clothing
allowance, transportation and representation allowances, honoraria, and other forms of
compensation and net receipts derived from the operation of family-operated
enterprises/activities and the practice of a profession or trade.
Total family income - includes primary income and receipts from other sources received by all
family members during the calendar year 1991 as participants in any economic activity or as
recipients of transfers, pensions, grants, etc.
Magnitude of the Poor - the number of families or the population whose annual per capita
income falls below the subsistence/poverty threshold.
Poverty Incidence - proportion of families/population whose annual per capita income falls
below the annual per capita poverty threshold to the total number of families/population
Poverty Threshold – annual per capita income required or the amount to be spent to satisfy
nutritional requirements (2,000 Kcal) and other basic needs
Subsistence Incidence - proportion of families/population whose annual per capita income falls
below the annual per capita food/subsistence threshold to the total number of families/population
Subsistence and Food Threshold – annual per capita income required or the amount to be spent
to satisfy nutritional requirements (2,000 Kcal)
Average Total Employment – arrived at by dividing the total employment during the pay
periods, nearest the middle of each quarter (Feb. 15, May 15, Aug. 15, and Nov. 15) by four
quarters.
53
Balance of Payments (BOP) – statistical statement that systematically summarizes, for a
specific period, the economic transactions of a country with the rest of the world. Transactions,
for the most part between residents and non-residents, consist of those involving goods, services
and income; those involving financial claims on and liabilities to the rest of the world; and those
(such as gifts) classified as transfers which are real resources and financial claims provided to, or
received from the rest of the world without the corresponding resources and financial claims
received or given in exchange.
The United Nations Secretariat, specialized agencies of the UN system, and representatives of
the International Monetary Fund (IMF), the World Bank and Organization for Economic Co-
operation and Development (OECD) as well as international experts identified and selected
the 48 MDG indicators.
54
Condom use rate of the contraceptive prevalence rate
Number of children orphaned by HIV/AIDS (to be measured by the ratio or proportion of
orphans to non-orphans aged 10-14 who are attending school)
Prevalence and death rates associated with malaria
Proportion of population in malaria risk areas using effective malaria prevention and
treatment measures
Prevalence and death rates associated with tuberculosis
Proportion of tuberculosis cases detected and cured under directly observed treatment
short course (DOTS)
55