New Week 3 4
New Week 3 4
Observation
This method of gathering data is usually used in situations where the respondents cannot
answer the researcher's question to obtain information for a research study. The observation
is structured to elicit information that could be coded to give numerical data. As a researcher,
you have to prepare a checklist using an appropriate rating scale that may categorize the
behavior, attitude or attribute that you are observing to answer the questions posed in your
study. As you observe, you will record your observation by using checkmarks or cross marks
on your checklist.
Survey
Quantitative data can be collected using four (4) main types of survey:
Sample survey
The researcher collects data from a sample of a population to estimate the attributes
or characteristics of the population. Example of sample survey pertains to customer
satisfaction, health care, politics, market research, academic or education surveys. At
the current time surveys concerning feedbacks from parents and teachers on the K-
12 implementation are very timely.
Administrative data
This is a survey on the organization's day-to-day operations. This kind of data is now
supported with various ICT tools and software making it easy for organizations
especially government, schools, industry, NGO to update their records efficiently and
effectively and put up their own Management Information Systems (MIS).
Census
The researcher collects data from the selected population. It is an official count on
survey of a population with details on demographics, economic and social data such
as age, sex, education, marital status, household size, occupation, religion,
employment data, educational qualifications, and housing. The collected data are
usually used by government or private firms for planning purposes and development
strategies. racer studies
Tracer studies
In school settings, tracer studies are used by educational institutions to follow up their
graduates. The survey is usually sent to a random sample after one or two years
after graduation from their courses. Tracer studies gather data on work or
employment data, current occupation and competencies needed in the workplace to
determine gaps in curriculum and other related activities between academe and
industry.
Quantitative Interview
The interview may be used for both quantitative and qualitative research studies. In
conducting a quantitative interview, the researcher prepares an interview guide or schedule.
It contains the list of questions and answer options that the researcher will read to the
respondent.
Questionnaire
A questionnaire may be standardized or researcher-made. A standardized questionnaire has
gone through the process of psychometric validation, has been piloted and revised.
1.1 Mean
Often called the arithmetic average of a set of data, the mean is the sum of the
observed values in the distribution divided by the number of observations. It is
frequently used for interval or ratio data. The symbol X̅ (x bar) is used to denote the
arithmetic mean.
The mean is calculated by summing up the observations (items, height, scores, or
responses) and dividing by the number of observations.
Mean ¿
Or
X=
∑x
n
The following examples show the calculation of the mean for ungrouped data, that is
a list of data that is not recognized in any way.
For Ungrouped data
Example 1: Find the mean of the measurement 18, 26, 27, 29, & 30
Solution:
Substitute the measurement using the formula.
X=
∑ x = 18+26+27 +29+30 = 130 =26
n 5 5
Note that the mean falls near the middle of the data set.
Answer: X̅ = 26
Mean (X )=
∑ x = 1960 =98
n 20
You can use the mean when the numbers you have can be added or when
characteristics are measured on a numerical scale like those used to describe
height, weight, or scores on a test.
For Grouped Data
When the observations are grouped into classes, the formula for grouped data is
as follows:
frequency of each class
Mean ( X )= x class midpoint
total number of observation
X w=
∑ fx
n
where:
f = frequency
x = numerical value or item in a set of data
n = number of observations in the data set
Example 1: Find the mean of the heights of 50 senior high school students
summarized as follows
Heights in Frequency Height x
inches) Frequency
56 6 336
57 15 855
58 12 696
59 8 472
60 5 300
61 2 122
62 2 124
∑ f =50 ∑ fx=2905
Solution:
Using the above data, the weighted mean is equal to the sum of the column fx,
divided by the total number of observations.
Weighted Mean X w =
∑ fx = 2905 =58.1inches
n 50
When the data is grouped into classes, the class midpoint represent the “X” in
the formula.
Example 2 : Solve for the mean of the data below.
Class Frequency Class Midpoint fx
(f) (x)
76-80 3 78 234
71-75 5 73 365
66-70 6 68 408
61-65 8 63 504
56-60 10 58 580
51-55 7 53 371
46-50 7 48 336
41-45 3 43 129
36-40 1 38 38
Total 2965
Solution:
Mean ( X )=
∑ fx = 2965 =59.3
n 50
1.3 Median
The median is the midpoint of the distribution. It represents the point in the data
where 50% of the values fall below that point and 50% fall above it. When
distribution has an even number of observations, the median is the average the
two middle scores. The median is the most appropriate measure of central
tendency for ordinal data.
Example 2:
Consider these even numbers of numerical values:
12, 15, 18, 22, 30, 32.
The two middle values are 18 and 22. If the average of the two middle numbers
is taken, that is, 18 + 22 = 40 and divided by 2, the median is 20
Answer: The Median is 20.
Example 3:
Find the median for the set of measurements.
15, 20, 12, 26, 3, 30, 14
Solution
We first rank the measurements from the smallest to the largest 3, 12, 14, 15,
20, 26, 30. Since the number of cases is odd, the median has rank
Answer: The Median is 15.
Suppose the last number is 32 (rather than 30), the median is still 15. Unlike the
mean, the median is not affected by extreme values in the distribution
n
−F
2
Median=L+i( )
f
Where:
L = exact lower limit of the class containing the median (median class)
i = interval size
n = total number of items or observations
F = cumulative frequency in the class preceding the median class
f = frequency of the median class
In the following examples, the use of the step by step procedure will be
illustrated:
Example 1
The following data show the distribution of the ages of people interviewed for a
survey on a topic about climate change.
Class Frequency Cumulative
Interval (f) Frequency (F)
11-20 20 20
21-30 14 34
31-40 22 56
41-50 18 74
51-60 14 88
61-70 12 100
f= 100
th
n
Since there are 100 values in the data set, the median will represent the ( ) or
2
th
100
the ( ) that is the 50th largest value.
2
Determine in which class the 50th value falls. The first two classes have a
cumulative frequency of 34 classes.
We need another 16 values to reach 50. Thus, the 50th value falls in the next
class which contains 22 values. The median class then is 31-40.
Thus,
L = 30.5
n = 100
F = 34
f = 22
i = 10
Substitute all these values using the following formula.
n
−F
2
Median=L+i( )
f
=30.5+10 ((50-34)/22)
= 30.5 + 10 (16/22)
= 30.5 + 160/2
= 30.5 + 7.27
= 37.77
This means that 50% or 50 of the 100 ages will fall below 37.77 and 50% or 50
will fall above it.
1.4 Mode
The mode is the most frequently occurring value in a set of observations. In
cases where there is more than one observation which is the highest but with
equal frequency, the distribution is bimodal (with 2 highest observations) or
multimodal with more than two highest observations. In cases where every item
has an equal number of observations there is no mode. The mode is
preproperate for nominal data.
Example 1:
The ages of fifteen (15) persons assembled in a room are as follows:
16, 18, 18, 18, 25, 25, 25, 30, 34, 36 and 38.
Solution:
An age of 25 is the mode because it has been recorded three times in the
sample, more than any other age.
Answer: Mode = 25
Example 2:
The number of hours spent by 10 students in an internet café was as follows.
2, 2, 2, 3, 3, 4, 4, 4, 5, 5
Solution:
Both 2 and 4 have a frequency of 3. The data is therefore bimodal.
Answer: Mode = 2 and 4
Example 3:
Referring to the data on the distribution of the ages of 100 people interviewed for
a survey on a topic on national interest, the modal class is 31-40. The mode
31+ 40
which corresponds to the class midpoint would be =35.5
2
2. Measures of Dispersion
Suppose you ask a group of senior high school students to rate the quality of
food at the school canteen and you find out that the average rating is 3.5 using
the following scale: 5 (Excellent); 4 (Very Satisfactory); 3 (Satisfactory); 2 (fair);
and 1 (Poor).
How close are the ratings given by the students? Do their ratings cluster
around the middle point of 3, or are their ratings spread or dispersed, with some
students giving ratings of 1 and the rest giving ratings of 5?
The extent of the spread, or the dispersion of the data is described by a group
of measures called measures of dispersion, also called measures of variability.
The measures to be considered are the range, average or mean deviation,
standard deviation and the variance.
|x−x|
Average Deviation ( AD )=∑
n
Example 1:
Consider a set of values which consists of 20, 25, 35, 40, 45. Solving for the
20+25+35+ 40+45
mean, , the mean is 33. Find the average deviation.
5
Solution:
¿|−13|+|−8|+|2|+|7|+¿ 12∨ ¿ ¿
5
13+8+2+7+12
¿
5
42
¿
5
AD = 8.4
Thus, on the average, each value is 8.4 units from the mean.
Example 1:
SD=
√ ∑ (x−x )2
n−1
Let us consider the same data used in the illustration for using the range. The
values are 6, 10, 12, 15, 18, 18, 20, 23 , 25, 28.
Solution:
1. Compute the mean
6+10+12+15+18+ 20+23+25+28
x=
10
175
¿
10
x=17.5
2. Subtract the x from each score (x), or x-x ̅
3. Square each difference from Step 2, or (x−x )2
Score (x) ( x−x ) (x−x )
2
SD=
√ ∑ (x−x )2
n−1
¿
√ 428.5
10−1
¿ √ 47.61
¿ 6.9
Interpretation of the Standard Deviation
The standard deviation allows you to reach conclusions about scores in
the distribution the following conclusions can be reached if that distribution
of scores is normal:
1. Approximately 68% of the scores in the sample falls within one
standard deviation of the mean
2. Approximately 95% of the scores in the sample falls within two
standard deviations of the mean
3. Approximately, 99% of the scores in the sample falls within three
standard deviations of the mean
4. In our example, with ai of 17.5 and a standard deviation of 6.95, we
can say that,
68% of the scores will fall in the range
= (17.5-6.95) to (17.5+6.95)
= 10.5 to 24.45
5. Likewise, 95% of the scores will fall in the range
= 17.5- (2)(6.95) to 17.5+ (2)(6.95)
= (17.5-13.9) to (17.5+13.9)
= 3.6 to 31.4
z=¿ ¿
(x 1−x 2)−(μ1−μ 2)
t= ¿
√
2 2
s s
1 2
+
n1 n 2
(x 1−x 2)−(μ1−μ 2)
t= ¿
√
2 2
sp sp
+
n1 n 2
Where
d−μd
t= (df =n−1)
sd
√n
Between Proportions and Percentages
For independent samples
z=¿ ¿
p 1 − p2
D− A z=
√
z= a+ d
√ A +D
N
Analysis of Variance
ANOVA is used when significance or difference of means or two or more
groups are to be determined at one time.
2. Test of Relationship
Spearman Rank-Order Correlation or Spearman rho. This is used
when data available are expressed in terms of ranks (ordinal
6∑ D
2
ρ=1− 2
N (N −1)
Chi-Square Test for independence. This is used when data are
expressed in terms of frequencies or percentages (nominal
variables).
Case 1: Multinomial
( 0−E )2
x =∑ E [ df = ( r−1 ) ( c−1 ) ] ¿
2
Where
r=
∑ ( x−x)( y− y )
¿¿¿
T-test to test the Significance of Pearson r. The T-test to
test the significance of Pearson r is used to determine if the
value of the computed coefficient of correlation is significant.
That is, does it represent a real correlation or is the obtained
coefficient or correlation merely brought by
2
( f o−f e )
x =∑ ¿¿
2
fe
¿
The formula
Where
t=r
√n−2
1−r 2
r = correlation coefficient
n = number of samples
Statistical Significance
Statistically Significant means that a relationship between two or more variables is
caused by something other than by random chance. Significant also means probably true
(not due to chance). When the result is highly significant, it means that it is very probably
true.
The level of significance shows how likely the results of your data are due to chance.
A chance of being true indicates that the finding has a five percent chance of not being true.
A level of significance means that there is a chance that the finding is true.
Statistical hypothesis testing is used to determine whether the result of a data set is
statistically significant.
Hypothesis
A hypothesis is a preconceived idea, assumed to be true and mus be tested for its
truth or falsity. Let us suppose a researcher is concerned with testing the relationship
between variables. Through inferential statistical measures, the researcher can discover
important information even if no relationship is established between the variables. It is
possible for the researcher to discover differences and, therefore may test individual or
Consequently, the researcher should think of inferential statistics in terms of whether
it tests for relationship or association or whether it tests for comparison or difference.
The two types of hypothesis are the null hypothesis and the alternative hypothesis.
Null hypothesis is denoted by H 0
The null hypothesis is the hypothesis that is always tested by a researcher. The
hypothesis indicates that there is no difference between the group means in the
comparison.
Alternative hypothesis is denoted by H a
The alternative hypothesis on the other hand, indicates that there is a true
difference between the group means.
The results will show that (1) either there is a meaningful difference between the two
groups, thus, you reject the null hypothesis or (2) the difference between the two groups is
not large enough to conclude that the groups are different thus you fail to reject the null
hypothesis. If the null hypothesis is rejected, then the alternative hypothesis is
accepted.
Example:
Let us suppose that an advertising agency is conducting an experiment using
two different methods of marketing strategies (X and Y) to grade 11 students.
The results of the experiment will be measured using the monthly sales of the
company.
Strategy X is equal to strategy that is, (X = Y)
Strategy X is better than strategy Y, that is, (X > Y)
Strategy X is poorer than strategy Y, that is, (X < Y)
Outcome A forms the basis of the null hypothesis, a statement of no
difference in monthly sales in the populations being compared
5. Determine the critical value the test statistic must attain to be significant.
After you have computed the calculated measure, you must look at the critical value
in the appropriate table for the distribution. The critical value defines the region of
rejection from the region of acceptance of the null hypothesis. The areas of
acceptance and rejection in a standard normal distribution, using α= .05, is illustrated
below.
Generalization
These important point/s of this Self Learning Kit (SLK) is/are listed below.
Formula
For Ungrouped Data
X=
∑x
n
Mean
For Grouped Data
X w=
∑ fx
n
Measures of For Ungrouped Data
th
Central n+1
n is odd = ( ) term
Tendency 2
n is even = X of two middle terms
Median For Grouped Data
n
−F
2
Median=L+i( )
f
For Ungrouped Data
Mode
-the most frequent
Range Highest score – Lowest Score
For Ungrouped Data
Average
Deviation |x−x|
Measures of Average Deviation ( AD )=∑
n
Dispersion For ungrouped Data
Standard
Deviation
Spearman
SD=
6 ∑
√ D
2
∑ (x−x )2
n−1
Correlation ρ=1−
Rank-Order 2
N (N −1)
Coefficient
correlation
Pearson r=
∑ ( x−x)( y− y )
Product ¿¿¿