Study Schedule Topic Learning Outcomes Activities Week 4
Study Schedule Topic Learning Outcomes Activities Week 4
MATHEMATICS AS A TOOL
Overview
You may be familiar with statistics through radio, television, newspapers, and
magazines. For example, you may have read statements like the following found in
newspapers. According to the PISA 2018 profile of the Philippines, socio-economic status
accounts for 18% of the variance in reading performance in the country, compared to the
OECD (Organization for Economic Cooperation and Development) average of 12%.
Statistics is used in almost all fields of human endeavor. In sports, for example, a
statistician may keep records of the number of yards a running back gains during a football
game, or the number of hits a baseball player gets in a season. In other areas, such as public
health, an administrator might be concerned with the number of residents who contract a
new strain of flu virus during a certain year. In education, a researcher might want to know if
new methods of teaching are better than old ones. These are only a few examples of how
statistics can be used in various occupations. Furthermore, statistics is used to analyze the
results of surveys and as a tool in scientific research to make decisions based on controlled
experiments. Other uses of statistics include operations research, quality control, estimation,
and prediction.
Introduction
55
mathematics are useful in processing and managing numerical data in order to describe a
phenomenon and predict values.
This chapter covers four topics of statistical tools. Lesson 1 discusses the basic terms
in statistics. Lesson 2 tackles measures of central tendency and measures of dispersion.
Lesson 3 focuses on hypothesis testing and lesson 4 discusses correlation and regression
analyses.
Learning Objective: At the end of the lesson, the students are expected to:
Discover This!
Statistics is a branch of mathematics that deals with the systematic method of
collecting, classifying, presenting, analyzing and interpreting quantitative or numerical data.
Variable is a characteristic of interest measurable on each and every individual in the
universe. It refers to a property that can take on different values or categories which cannot
be predicted with certainty.
A variable may also be called a data item. Age, sex, business income and expenses,
country of birth, capital expenditure, class grades, eye color and vehicle type are examples of
variables.
Types of Variable:
a) Dependent variable is what you measure in the experiment and what is affected during
the experiment. The dependent variable responds to the independent variable. the
“assumed effect” of another variable. It is an outcome of interest (e. g. characteristic of
behavior) that is being observed and measured in order to assess the effects of the
independent variable. They are those that the researcher has control over. This “control”
may involve manipulating existing variables (e.g. modifying existing methods of
instruction) or introducing variables (e.g. adopting a totally new method for some sections
of a class) in the research setting. Whatever the case may be, the researcher expects that
the independent variable(s) will have some effect on (or relationship with) dependent
variables.
56
b) Independent variable is the variable you have control over, what you can choose and
manipulate. It is usually what you think will affect the dependent variable. In some cases,
you may not be able to manipulate the independent variable. It is the “assumed cause” of
a problem; assumed reason for a change. It is a variable that is examined in order to
determine its effects on an outcome of interest (the dependent variable). Examples: type
of incentive, instructional materials, pharmaceutical compound. It shows the effect of
manipulating or introducing the independent variables. For example, if the independent
variable is the use or non-use of a new language teaching procedure, then the
independent variable might be students’ scores on a test of the content taught using that
procedure. In other words, the variation in the dependent variable depends on the
variation in the independent variable.
Examples are all students of CHMSC-Talisay Campus enrolled in the first semester-
academic year 2020-2021, all members of a particular club, all mobile phones sold by a certain
cell shop in one month, all babies born in a particular year, etc.
The data (Asaad, 2004) are the quantities (numbers) or qualities (attributes) measures
or observed that are to be collected and/or analyzed. Two categories of data are categorical
and continuous data.
Categorical data are nominal and ordinal scales while continuous data are ratio and
interval scales.
57
Scale of Measurement
Variables can be classified according to how they are categorized, counted or
measured.
1) Nominal Scale
This is characterized by data that consist the names, labels, or categories only. The
data cannot be arranged in an ordering scheme. There is no criterion as to which values can
be identified as greater than or less than other values. It is used for labeling variables. There is
no intrinsic ordering to the categories. They are numbers used to names. Observations of
unordered variables constitute a very low level of measurement. Numbers have no
quantitative properties. They serve only to identify the class.
Examples are gender, mode of transportation, nationality, occupation, civil status,
course, specialization, etc.
2) Ordinal Scale
This involves data that maybe arrange in some order, but differences between data
values either cannot be determined or are meaningless. An ordinal scale produces a distinct
ordering or arrangement of data in which the observations may be ranked based on some
criteria such as good, better and best. They represent position in a series. Scale in which the
classes stand in a relationship to one another that is expressed in terms of algebra of
inequalities (less than or greater than).
Examples are pain level, social status, attitude towards a subject, satisfaction level,
etc.
3) Interval Scale
This is the same as the ordinal level, with an additional property to help determine
meaningful amounts of differences between data. Data at this level may lack an inherent zero
starting point. Variable on an interval scale are measured numerically and like an ordinal data.
It carries an inherent ranking or ordering. If we have data with ordinal properties (>, =) and
can also measure the distance between the two data items, we have an interval
measurement. It can determine or measure the distances between numbers. It is a
quantitative scale that requires a constant unit of measurement and permits the use of
arithmetic operations. The zero point in this scale is arbitrary. It does not represent the
complete absence of the attribute being measured.
Examples are temperature (in degree Celsius), test result, IQ, General weighted
average (GWA), etc.
4) Ratio Scale
This is an interval level modified to include the inherent zero starting point. The
difference and ration of data are meaningful. This is also the highest scale of measurement.
The four scales of measurement, only the ratio scale is based on the number system in which
58
zero becomes meaningful. Arithmetic operations such as multiplication, division, addition and
subtraction take a rational interpretation. Ratio scale is used to measure several types of data
found in business such as cost, profit and inventory. These variables are expressed in ratio
measures. It is the highest level of measurement and allows for all basic arithmetic
operations. Data measured on a ratio scale have a fixed or non-arbitrary zero point. Same as
interval scale, except that there is a true zero point.
Examples are income, profit, dimensions, weight, height, age (in years), number of
years in education, etc.
Sampling Techniques
Various sampling techniques or sample designs can be used by the researcher. The
choice of what technique to be used will depend on the nature of the problem at hand, the
king of population and in which sample results will be applied.
The techniques can be grouped into how selections of items are made such as
probability sampling and non-probability sampling.
1) Probability Sampling.
In probability sampling, the sample is a proportion of the population and such sample
is selected from the population by means of systematic way in which every element of the
population has a chance of being included in the sample. It might not be equal chance as long
as there is nonzero chance to be selected in the study is considered as probability sampling.
a) Random Sampling
This type of sampling is one in which everyone in the population of the study has an
equal chance of being selected to be included in the sample. For example, lottery method,
using the table of random numbers or computer-generated random numbers.
b) Systematic Sampling
This is a technique of sampling in which every n th name in the list may be selected to
be included in the sample. For example, a mall owner wants to conduct a study on the
satisfaction of customers in terms of the service provided by the mall. So, he asked his
employees to conduct the study. Since it may be impossible to use random sampling, the
owner decided to use systematic sampling where every 5th customer who entered the mall
is surveyed and asked questions about their level of satisfaction.
59
according to sex. Therefore, sex is your strata. This is only possible if your population is almost
equally divided. Another example, 100 belongs to high family income, 4000 belongs to
average family income and 900 belongs to low family income, in this case, economic status is
not applicable (or impractical) to use as your strata.
d) Cluster Sampling
It is sometimes called area sampling because it is applied on geographical basis. A
cluster sampling will give more precise results particularly when each cluster contains a more
varied mixture and when one cluster is nearly like the other. This kind of sampling is used if
you have a huge population. For example, all teachers in Negros Occidental is your
population. You can group them according to cluster. Cluster 1 for Division of Bacolod, cluster
for Division of Silay and so on and so forth.
2) Non-Probability Sampling
In a non-probability sampling, the sample is not a proportion of the population and
there is no system in selecting the sample. The selection depends on the situation.
a) Purposive Sampling
It is based on certain criteria laid down by the researcher. People who satisfy the
criteria are interviewed. Purposive sampling is determining the target population of those
who will be taken for the study. The respondents are chosen on the basis of their knowledge
of the information desired. For example, you want to determine the performance of honor
students in their majoring subject. In this case, the only respondents of your study are those
students that have honors which is a purposive kind of sampling.
b) Convenience Sampling
It is a process of picking out people in the most convenient and fastest way to get
reactions immediately. For example, you want to conduct a survey about political views and
opinions in a particular barangay. Since it is impractical to use probability sampling,
convenience sampling is practically valid in this case. Again, it depends upon your kind of
respondents, your research design and statement of the problem.
c) Quota Sampling
This type of sampling specified number of persons of certain types in included in the
sample. In quota sampling many sectors of the population are represented. However, the
representation is doubtful are no guidelines in the selection of the respondents. Unlike
cluster, in quota sampling, some elements in the population have no chance to be selected in
the study.
60
Clarify Your Lesson! (Let’s Try This #2)
A. Give one example for each scale of measurement and discuss in not less than 3
sentences why the example is suited to it. (5 points each)
5 points Rubrics
5 The student clearly understands the concept. However, some minor mistakes and
careless errors appears insofar as they do not indicate a conceptual
misunderstanding.
4 The student understands the main concepts, but has some minor yet non-trivial gaps
in their reasoning.
3 The student has partially understood the concepts. The student may have started
out correctly, but gone on a target of the concepts.
2 The student has a poor understanding of the concepts.
1 The student did not understand the concepts.
0 The student wrote nothing or almost nothing.
3. What type of scale is being used for each of the following measurements?
a. Number of arithmetic problems correctly solved _______________
b. Class standing (i.e., one’s rank in the graduating class) _______________
c. Type of phobia _______________
61
d. Body temperature (in °F) _______________
e. Self-esteem, as measured by self-report questionnaire _______________
f. Annual income in dollars _______________
g. Theoretical orientation toward psychotherapy _______________
h. Place in a dog show _______________
i. Heart rate in beats per minute _______________
4. A psychologist records how many words participants recall from a list under three
different conditions: large reward for each word recalled, small reward for each word
recalled, and no reward.
62
8. In the same study, the amount of food eaten in one day is measured for each girl and
the researcher computes the average score for the 30, 13-year-old girls. The average
score is an example of a (parameter, statistic).
9. Out of 500 students of BS Psych, the desirable members of the club are only 10 per
section. 500 students is a (parameter, statistic).
10. 60% of products sold are food-related products and only 2% of these food-related
products are healthy. 2% is a (parameter, statistic).
D. Determine which kind of sampling was used in each of the following scenarios.
(Random, Stratified Random, Systematic, Cluster, Purposive, Convenience, Quota)
21. Chosen at random, 300 students who received a scholarship from CHMSC-Talisay
participated in a study. _______________
22. A survey to find out if teachers in CHMSC-Talisay are in favor of Outcome-Based
Education (OBE) will be conducted. To ensure that all faculty in each department are
represented, teachers will be divided into COEd, CAS, and CIT Departments.
_______________
23. You would like to know the level of satisfaction of students in terms of school canteen
service. You decided to have interview of students who are eating at the school
canteen every lunch time. _______________
24. To get the most popular online game, each field student-researcher is given a quota
of 50 students per course. _______________
25. In a study wherein, a researcher wants to know what it takes to graduate with honors
in CHMSC-Talisay, the only people who can give the researcher first hand advise are
the individuals who graduated with honors. _______________
63
E. Identify the independent and dependent variables in the following descriptions of
experiments. Underline the phrase or group of words that tells about independent
variable (dependent variable) and write IV (or DV) below it.
26. The more time people spend using social media, the less able they are to express
themselves in conversation.
27. Taking a nap in the afternoon makes people more relaxed and less irritable for the rest
of the day.
28. The relationship between the amount of violence that children see on television and
the amount of aggressive behavior they display.
29. Does attentiveness in class influence teacher effectiveness?
30. What are the effects of psychological variables on teacher’s productivity?
September Completion of Let’s Try This and Gauge File your activity in your red long
14, 2020 Your Learning Activities clear book.
Learning Objectives: At the end of the lesson, the students are expected to:
1. Find the measures of central tendency and dispersion of the given data.
b. ∑ 5𝑋𝑖
64
c. ∑ 𝑋𝑖 2
2
2. Make up your own set of at least five numbers and demonstrate that ∑ 𝑋𝑖 2 ≠ (∑ 𝑋𝑖 ) .
3. Round off the following numbers to two decimal places (assume digits to the right of those
shown are zero):
a. 144.0135 _______________
b. 67.245 _______________
c. 99.707 _______________
d. 13.345 _______________
e. 7.3451 _______________
f. 5.9817 _______________
g. 5.9977 _______________
4. Round off the following numbers to four decimal places (assume digits to the right of
those shown are zero):
a. .76995 _______________
b. 3.141627 _______________
c. 2.7182818 _______________
d. 6.89996 _______________
e. 1.000819 _______________
f. 22.55555 _______________
5. Round off the following numbers to one decimal place (assume digits to the right of those
shown are zero):
a. 55.555 _______________
b. 267.1919 _______________
c. 98.951 _______________
d. 99.95 _______________
e. 1.444 _______________
f. 22.14999 _______________
Discover This!
A measure of central tendency is any single value that is used to identify the “center”
of the data or the typical value. It is called measure of central tendency because when the
data points are arranged according to magnitude, it tends to lie centrally within the set.
1. Mean or Arithmetic Mean (𝐗 ̅)
Mean is the sum of all the values of the observations divided by the number of
∑𝑥
̅=
observations X , where 𝑛 is the number of observations in the sample.
𝑛
Example 1: What is the mean age (in years) of group children whose ages are 9, 11, 7, 10, 9,
8, 8, 7, 12, 7 and 13?
Solution: X̅ = 9 + 11 + 7 + 10 + 9 + 8 + 8 + 7 + 12 + 7 + 13
11
101
=
11
65
= 9.18 years
2. Median (𝐗̃)
Median is the positional middle of an array. In an array, one-half of the values precede
the median and one-half follow it. The first step in calculating the median, denoted by (X ̃), is
to arrange the data in an array. Let X(𝑖) the 𝑖 𝑡ℎ observation in the array, 𝑖 = 1, 2, … 𝑁.
𝑁+1 𝑁+1 𝑡ℎ
If 𝑁 is odd, the median position equals ( ), and the value of the ( ) observation in
2 2
the array is taken as the median, i.e. ̃
X = X𝑁+1 .
2
If 𝑁 is even, the mean of the two middle values in the array is the median, i.e.
X𝑁 + X𝑁+1
̃= 2
X 2
2
Example 2: Find the median of the given data set: 75, 67, 71, 75, and 72
Solution: First, arrange the data set in ascending order: 67, 71, 72, 75, 75
Since 𝑁 = 5, we will use ̃ X = X𝑁+1 , hence, ̃X = X𝑁+1
2 2
= X5+1
2
= X3
= 72.
67, 71, 72, 75, 75.
Therefore, ̃ X = 72.
3. Mode (𝐗 ̂)
Mode is the observed value the occurs most frequently. It locates the point where the
observation values occur with the greatest density. It does not always exist, and if it does, it
may not be unique. A data set is said to be unimodal if there is only one mode, bimodal if
there are two modes, multimodal if there three or more. It is not affected by extreme values.
It can be used for qualitative as well as quantitative data.
Example 4: Find the mean, median, and mode of the following ages in years.
1.) 3, 4, 5, 5, 6, 7, 9, 10, 14
2.) 7, 8, 9, 9, 10, 10, 11, 12
Solution:
1.) 3, 4, 5, 5, 6, 7, 9, 10, 14
∑X 3+4+5+5+6+7+9+10+14
Mean: ̅
X= =
𝑛 9
63
= 9
66
= 7 years
Median: Since N is 9 (which is odd), use the first formula:
̃
X = 𝑥(𝑁+1) = 𝑥(9+1)
2 2
= 𝑥10
2
= 𝑥5 , then, what is the 5th score in an ordered distribution? The
answer is 6. Therefore, the median is 6.
Mode: The mode is 5 since it has highest frequency (is appears twice)
∑𝑥 7+8+9+9+10+10+11+12
Mean: ̅
X= =
𝑛 8
76
= 8
= 9.5 years
Median: Since N is 8 (which is even), use the second formula:
𝑥 𝑁 + 𝑥 𝑁+1
( ) ( ) 𝑥4 +𝑥5
̃
X= 2 2
=
2 2
9+10
= 2
= 9.5, therefore, the median is 9.5.
Mode: The modes are 9 and 10 since they have the highest frequency (appeared
twice). It is bimodal.
Weighted mean (X ̅ w) is the sum of the mean of each group multiplied by its respective
weight divided by the sum of the weights. (For mean alone, the weight values in each
distribution are equal). Example of weighted mean is solving your weighted average in a
semester to determine if you belong to the dean’s list. Each of your grade has a corresponding
number of units (Example, GECMAT is 3 units, major subject is 4 or 5 units, and so on and so
forth).
1
Example 5: Francis answered 20 calculus problems. He spent 12 hours for the first 6 problems;
45 minutes for the next 3; and 3 hours for the last 11 problems. What was the average time
(in minutes) he spent for the 20 problems?
Solution: This problem requires the weighted average time because each set of problems has
a weight (which is time).
67
𝑥1 (𝑤1 ) + 𝑥2 (𝑤2 ) + … + 𝑥𝑛 (𝑤𝑛 ) 6(90) + 3(45) + 11(180)
𝑋̅𝑤 = =
𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛 90 + 45 + 180
540 + 135 + 1980 2655
= = ≈ 8.42 minutes
315 315
Measures of Dispersion
Measures of dispersion/variability indicate the extent to which individual items in a
series are scattered about an average. It is used to determine the extent of the scatter so that
steps may be taken to control the existing variation. It is also used as a measure of reliability
of the average value.
1. Range
The range of a set of measurements is the difference between the largest and the
smallest values. Range (𝑅) = Maximum value − Minimum value
Example 6: The IQ scores of 5 members of CHMSC Basketball men varsity are 108, 112, 127,
116, and 113. Find the range.
Solution: R = 127– 108 = 19
̅=
3+8+5+4+4 ∑(Xi − ̅
X )2
X 2
𝑠 = ∑(Xi − ̅
X )2
5 𝑛−1 𝑠= √
24 14.8 𝑛−1
= 5 = 5−1
= 4.8 = √3.7
= 3.7 Variance
= 1.92
Standard Deviation
68
Clarify Your Lesson!
1. Select the measure of central tendency (mean, median, or mode) that would be most
appropriate for describing each of the following hypothetical sets of data and give your
reason.
a. Religious preferences of delegates to the United Nations.
b. Heart rates for a group of women before they start their first aerobics class.
c. Types of phobias exhibited by patients attending a phobia clinic.
d. Amounts of time participants spend solving a classic cognitive problem, with some of
the participants unable to solve it.
e. Height in inches for a group of boys in the first grade
2. A veterinarian is interested in the life span of golden retrievers. She recorded the age at
death (in years) of the retrievers treated in her clinic. The ages were 12, 9, 11, 10, 8, 14,
12, 1, 9, 12.
a. Calculate the mean, median, and mode for age at death.
b. After examining her records, the veterinarian determined that the dog that had died
at 1 year was killed by a car. Recalculate the mean, median, and mode without that
dog’s data.
c. Which measure of central tendency in part b changed the most, compared to the
values originally calculated in part a?
Challenge Yourself!
(Let’s Try This #2)
Solve the following problems with complete process of the solution.
A study was conducted to determine the level of awareness of the residents of a certain
municipality on the causes of hypertension. The accompanying table shows the results
with respect to the gender of the respondents.
69
9. Family history of hypertension 2.6 2.7
10. Present medical conditions like kidney 2.5 2.5
disorder, diabetes mellitus and others
Average
a. Find the average for the male and for the female.
b. Find the weighted mean, ̅ Xw, for each indicator letting 𝑛 as the weight.
c. Find the standard deviation for the male and female.
70
Study ScheduleTopic Learning Outcomes Activities
Week 4 Data Management
Module 4.3 Hypothesis Testing
September 15- 1. Formulate a hypothesis utilizing Explore: Discover This!
16, 2020 the five steps on testing of Engage: Let’s Try This!
hypothesis Explain: Clarify Your Lesson!
Elaborate: Challenge Yourself!
2. Perform a hypothesis testing
Evaluate: Gauge Your Learning!
3. Test the significant difference
between groups
September 16, Completion of Let’s Try This and File your activity in your red long
2020 Gauge Your Learning Activities clear book.
Learning Objectives: At the end of the lesson, the students are expected to:
Discover This!
Hypothesis testing deals with the problem of testing specific assertions about the
population regarding the value of the unknown parameter or the distributional properties of
the population. The statement is stated in the form of a hypothesis and the statistical tool
used to decide whether or not to reject said statement is a test of hypothesis.
This is the process of making an inference or generalization on population parameters
based on the results of the study on samples. It is a procedure for deciding if the null
hypothesis should be rejected in favor of an alternative hypothesis, or will not be rejected. It
is a statistical procedure that allows researchers to use sample data to draw inferences about
the population of interest. It is a statistical method that uses sample data to evaluate a
hypothesis about a population.
71
Definition:
1. Statistical hypothesis – is statement or conjecture concerning one or more population.
2. Null hypothesis (Ho) – is the hypothesis that is being tested; it represents what the
experimenter doubts to be true.
3. Alternative hypothesis (Ha) – is the operational statement of the theory that the
experimenter believes to be true and wishes to prove.
4. Type I error - is the error made by rejecting the null hypothesis when it is true. The
probability of a Type I error is α (alpha).
5. Type II error – is the error made by accepting (not rejecting) the null hypothesis when
it is false. The probability of a Type II i=error is denoted by β (beta).
6. Level of significance (α) - is the maximum probability of Type I error the researcher is
willing to commit.
Step 1: Determine the variable of interest 𝑋, State the null hypothesis (H0 ) and alternative
hypothesis (Ha ) in words and in symbols.
Null hypothesis (𝑯𝟎 ) is always hoped to be rejected. Always contains “=” sign.
Null hypothesis states that there is no statistically significant difference (effect,
change, relationship) between the variables. (It uses “= “symbol, sometimes, ≤ 𝑜𝑟 ≥).
The population mean value is equal to a hypothesized (standard) value. (The new
vaccine is as effective as the one commonly used. (𝜇 = 24 ℎ𝑜𝑢𝑟𝑠)
There is no significant difference between the two parameters. (Male students are
equally intelligent to female students in Mathematics)(𝜇𝑀 = 𝜇𝐹 )
There is no significant relationship between two variables. (There is no significant
relationship between the effect of new antianxiety drug and heart rate of a person.)
The experimental treatment on a group of students has had no effect on its
performance.
(𝜇𝑎𝑓𝑡𝑒𝑟 = 𝜇𝑏𝑒𝑓𝑜𝑟𝑒 𝑜𝑟 𝜇𝑑 = 0)
Alternative hypothesis (𝐻𝑎 ) challenges 𝐻0 . Never contains “=” sign. Uses “< or > or
≠”. It generally represents the idea which the researcher wants to prove. It is a statement that
there is a relationship between variables.
It is a statement specifying that the population parameter is some value other than
the one specified under the null hypothesis. It states that there is a change, a difference, or a
72
relationship for the general population. In the context of an experiment, 𝐻𝐴 predicts that the
independent variable (treatment) does have an effect on the dependent variable.
The population mean value is greater than (less than, not equal to) to a hypothesized
(standard) value. (The new vaccine is not as (more / less) effective as the one commonly used.)
(𝜇 ≠ 24 ℎ𝑜𝑢𝑟𝑠, 𝜇 > 24, 𝜇 < 24)
There is significant difference between the two parameters. (Male students are not as
(more / less) intelligent to female students in Mathematics) (𝜇𝑀 ≠ 𝜇𝐹 , 𝜇𝑀 > 𝜇𝐹 , 𝜇𝑀 < 𝜇𝐹 )
There is a significant relationship between two variables. (There is a significant
relationship between the effect of new antianxiety drug and heart rate of a person.)
The experimental treatment on a group of students has had an effect on its
performance.
(𝜇𝑎𝑓𝑡𝑒𝑟 > 𝑜𝑟 𝜇𝑑 > 0)
73
the data must diverge from the null hypothesis to be significant. Therefore, the 0.01 level is
more conservative than the 0.05 level.
A test statistic is a statistic whose value is calculated from sample measurements and
on which the statistical decision will be based.
The critical region or rejection region is the set of values of the test statistic for which
the null hypothesis will be rejected. The acceptance region is the set of values of the test
statistic for which the null hypothesis will not be rejected. The acceptance and rejection
regions are separated by a critical value of the test statistic.
74
c) Critical region
The region of rejection can be found in terms of critical z-scores – the z-scores that cut
off an area of the normal distribution that is exactly equal to alpha. It has a size equal to 𝜶 . It
covers the range of values of the test value that indicates the difference was probably due to
chance and that 𝑯𝟎 should be rejected. The noncritical (acceptance) region has a size equal
to 𝟏 − 𝜶.
Decision rule:
Critical value approach:
Two-tailed test: Reject 𝑯𝟎 , if the |Computed value CV| ≥ |Critical value|, otherwise, do not
reject 𝑯𝟎 .
One-tailed right test: Reject 𝑯𝟎 , if the 𝑪𝑽 ≥ Critical value, otherwise, do not reject 𝑯𝟎 .
One-tailed left test: Reject 𝑯𝟎 , if the 𝑪𝑽 ≤ Critical value, otherwise, do not reject 𝑯𝟎 .
𝒑-value approach: Reject 𝑯𝟎 , if 𝒑-value is less than or equal to 𝜶 (𝒑 ≤ 𝜶), otherwise, do not
reject 𝑯𝟎 .
t-test for two independent samples is a test of difference between two independent sample
groups. The two means are compared (when 𝜎 is unknown):
𝑋̅1 −𝑋̅2
𝑡= (𝑛 −1)𝑠1 2 + (𝑛2 −1)𝑠2 2 1 1
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
√ 1 √𝑛 +𝑛
𝑛1 + 𝑛2 −2 1 2
75
Example 3: Identify the variable of interest that you are going to use to represent information.
Formulate the appropriate null hypothesis (𝐻0 ) and the appropriate alternative hypothesis
(𝐻𝑎 ).
a. The soft drink dispenser of a fast food center was just readjusted. The manager,
wanting to know if the dispenser is really in good condition, got a sample of 50 cups
filled by the dispenser. He would only classify the dispenser as “in good condition”
(and therefore it need not be readjusted again) if the average fill per cup of the
dispenser is 8 ounces.
b. A common measure of intelligence is the intelligence quotient (10) test (Castles, 2012;
Spinks et al., 2007) in which scores in the general healthy population are
̅). Suppose we select a sample
approximately normally distributed with 100 ± 15 (μ ± X
of 100 graduate students to identify if the 10 of those students is significantly
different from that of the general healthy adult population. In this sample, we record
a sample mean equal to 103 (M= 103). Determine the null hypothesis and alternative
hypothesis.
Solution: Variable of interest: IQ score
H0 : μ = 100: The mean IQ score is equal to 100 in the population of
graduate students.
H0 : μ ≠ 100: The mean 10 score is not equal to 100 in the population
of graduate students
̅)
TEST FOR ONE SAMPLE GROUP (POPULATION MEAN 𝝁 vs SAMPLE MEAN 𝑿
where 𝑋̅ is the sample mean, 𝜇 is the population mean, 𝜎 is the population standard
deviation, 𝑠 is the sample standard deviation, 𝑛 is the sample size, 𝑧 is the z-value and 𝑡 is
the t-value.
76
ONE-SAMPLE Z-TEST (𝒏 ≥ 𝟑𝟎)
A one-sample 𝒛-test works when you have a single group of people (or things) and you
wonder whether they are different in some way from some hypothesized population. The
more common research question of interest is whether a sample matches the characteristics
that one would expect if that sample wasn’t different in some way from this imagined
population.
Example 4: According to a dietary study, high sodium intake may be related to ulcers, stomach
cancer, and migraine headaches. The human requirement for salt is only 220 milligrams per
day and a standard deviation of 24.5 milligrams, which is surpassed in most single servings of
ready-to-eat cereals. If a random sample of 50 similar servings of a certain cereal has a mean
sodium content of 244 milligrams, does this suggest at the 0.05 level of significance that the
average sodium content for a single serving of such cereal is greater than 220 milligrams?
Assume the distribution of sodium contents to be normal.
Solution:
Step 1: Variable of interest 𝑿: sodium content of a certain cereal (in milligrams)
𝑯𝟎 : The mean sodium content of a certain cereal is 220 mg. 𝝁 = 𝟐𝟐𝟎
mg
𝑯𝒂 : The mean sodium content of a certain cereal is greater than 220 mg 𝝁 > 𝟐𝟐𝟎
mg
Step 2: 𝜶 = 𝟎. 𝟎𝟓
Step 3:
a) Type of test: One-tailed right test (directional)
b) z-test: Critical value = 1.65
c) Critical region
d) Decision rule: Reject 𝑯𝟎 , if the 𝑪𝑽 ≥ 𝟏. 𝟔𝟓, otherwise, do not reject 𝑯𝟎 .
e) Step 4: Statistical tool and computation of computed value.
77
Given: ̅− 𝝁
𝑿
𝒛= 𝝈
𝝁 = 𝟐𝟐𝟎 mg z = 𝟔. 𝟗𝟑
𝝈 = 𝟐𝟒. 𝟓 mg √𝒏
̅ = 𝟐𝟒𝟒 mg
𝑿 𝟐𝟒𝟒 − 𝟐𝟐𝟎
z=
𝑛 = 50 servings 𝟐𝟒. 𝟓
√𝟓𝟎
Step 5: Decision: Reject 𝑯𝟎 , since 𝟔. 𝟗𝟑 > 𝟏. 𝟔𝟓.
Conclusion: Therefore, it does suggest at the 0.05 level of significance that the mean
sodium content of a certain cereal is greater than 220 mg.
ONE-SAMPLE T-TEST (𝒏 < 𝟑𝟎)
Example 5: An expert typist can type 65 words per minute. A random sample of 16 applicants
took the typing test and an average speed of 62 words per minute with a standard deviation
of 8 words was obtained. Can we say that the applicant’s performance is below the standard
at 0.05 level?
Solution:
1. Variable of interest: typing speed (in words per minute)
Hypothesis: 𝐻0 : 𝜇 = 65 words per minute (claim)
𝐻𝑎 : 𝜇 < 65 words per minute
78
Decision rule: Reject 𝐻0 if the test or computed value is less than or equal to − 1.753
(𝑡 ≤ −1.753), otherwise, do not reject 𝐻0 .
𝑋̅1 − 𝑋̅2
𝑡=
(𝑛1 − 1)(𝑠1 )2 + (𝑛2 − 1)(𝑠2 )2 1 1
√ √ +
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2
where
𝑋̅1 is the mean of sample 1
𝑋̅2 is the mean of sample 2
𝑠1 is the standard deviation of sample 1
𝑠2 is the standard deviation of sample 2
𝑛1 is the sample size of sample 1
𝑛2 is the sample size of sample 2
Example 6: Samples of the weights of male and female students were obtained with the
following sample statistics. Using α = 0.01, do the sample means provide sufficient evidence
that the weights of male and female students are equal?
We will apply the five steps in hypothesis testing in
this problem. But before that, let us first Sample 𝑛 𝑋̅ 𝑠
determine the groups. Female 12 112.8 lbs 12.8
Let: 𝑋1 be the group of female students
𝑋2 be the group of male students Male 12 148.3 lbs 7.8
Solution:
Step 1: Variable of interest: weight of students (in lbs.)
79
(Null hypothesis) 𝐻0 : The weights of female and male students are the same. 𝜇1 = 𝜇2
(Alternative hypothesis) 𝐻𝑎 : The weights of female and male students are different. 𝜇1 ≠ 𝜇2
Step 2: Level of Significance 𝛼
𝛼 = 1% or 0.01
𝛼 0.01
= = 0.005
2 2
Step 3:
Type of test: Two-tailed test
Test statistics: Independent t-test
(Degrees of freedom) df = 𝑛1 + 𝑛2 − 2 = 12 + 12 – 2 = 22
Critical value 𝐶𝑉 = 2.82 (Refer Appendix A)
Decision rule: Reject 𝐻0 if the absolute value of test or computed value is greater than
or equal to |2.82| (|𝑡| ≥ |2.82|), otherwise, do not reject 𝐻0 .
= −2.03
Step 5:
Decision: Since |−2.03| < |2.82|, do not reject 𝐻0 .
Conclusion: Therefore, the sample means provide sufficient evidence that the weights of male
and female students are equal. The weights of female and male students are the same at 0.01
level.
1. A psychiatrist is testing a new antianxiety drug, which seems to have the potentially
harmful side effect of lowering the heart rate. For a sample of 50 medical students whose
pulse was measured after 6 weeks of taking the drug, the mean heart rate was 70 beats
per minute (bpm). If the mean heart rate for the population is 72 bpm with a standard
deviation of 12, can the psychiatrist conclude that the new drug lowers heart rate
significantly?
2. Imagine that you are testing a new drug that seems to raise the number of T cells in the
blood and therefore has enormous potential for the treatment of disease. After treating
100 patients, you find that their mean (𝑋̅) T cell count is 29.1. Assume that 𝜇 and
𝜎 (hypothetically) are 28 and 6, respectively.
80
a. Test the null hypothesis at the .05 level, two-tailed.
b. Test the same hypothesis at the .01 level, two-tailed.
c. Describe in practical terms what it would mean to commit a Type I error in this
example.
d. Describe in practical terms what it would mean to commit a Type II error in this
example.
e. How might you justify the use of .01 for alpha in similar experiments?
81
At the 0.05 significance level, test the claim that the two production methods yield
batteries with the same mean. Based on the results, if you were buying a battery of your
car, would you prefer a battery manufactured by the traditional method or the
experimental method?
A. State the null and alternative hypotheses and identify the following as one-tailed or two-
tailed.
1. A researcher studies gambling in young people. She thinks those who gamble spend
more than $30 per day.
2. A researcher wishes to see if police officers whose spouses work in law enforcement
have a lower score on a work stress questionnaire than the average score of 120.
3. A teacher feels that if an online textbook is used for a course instead of a hardback
book, it may change the students’ scores on a final exam. In the past, the average final
exam score for the students was 83.
4. A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication. The mean pulse rate for the population
under study is 82 beats per minute.
5. A chemist invents an additive to increase the life of an automobile battery. The mean
lifetime of the automobile battery without the additive is 36 months.
B. Carry out a complete test of hypothesis for the following problems. Show complete
solution.
1. According to a dietary study, high sodium intake may be related to ulcers, stomach
cancer, and migraine headaches. The human requirement for salt is only 220
milligrams per day, which is surpassed in most single servings of ready-to-eat cereals.
If a random sample of 36 similar servings of a certain cereal has a mean sodium
content of 244 milligrams and a standard deviation of 24.5 milligrams, does this
suggest at the 0.05 level of significance that the average sodium content for a single
serving of such cereal is greater than 220 milligrams? Assume the distribution of
sodium contents to be normal.
82
2. Self-Esteem Scores
In a study of a group of women science majors who remained in their profession and
a group who left their profession within a few months of graduation, the researchers
collected the data shown here on a self-esteem questionnaire. At α = 0.05, can it be
concluded that there is a difference in the self-esteem scores of the two groups?
Leavers Stayers
𝑋̅1 = 3.05 𝑋̅2 = 2.96
𝑠1 = 0.75 𝑠2 = 0.82
𝑛1 = 41 𝑛2 = 41
Learning Objectives: At the end of the lesson, the students are expected to:
When performing research studies, scientist often wish to know whether two
variables are related. If the variables are determined to be related, a scientist may then wish
to find an equation that can be used to model the relationship.
83
Let’s Try This!
1. Twenty students take a Spanish
written test. The scatter diagram
shows their marks and the number of
Spanish lessons they had missed
during the year.
a) Write down the mark of the
student who missed most
lessons.
b) Write down the number of
lessons missed by the student
having a mark of 36.
c) One student missed many
lessons but still had a high
mark in the test. Write down
the mark and number of
lessons missed by this
student.
d) The teacher looks at the scatter diagram and concludes: "The more Spanish lessons a
student attends, the higher their mark in the written test." Does the information in
the scatter diagram support this conclusion? Give a reason for your answer.
Discover This!
Correlation Analysis
Correlational analysis determines the strength and degree of relationship between
two variables and test if there is a significant relationship between two variables while
regression analysis predicts the dependent variable using the independent variable if it has a
relationship exists between two variables.
Correlation Analysis is concerned with the relationship in the changes of the given
variables. The relationship can be computed and may be shown in a scatter diagram. If y
increases as x increases the correlation is called positive or direct correlation. If y increases as
x decreases the correlation is negative or inverse correlation.
If there is no relationship indicated between x and y variables then we say that there
is no correlation between them. There are degrees of correlation between two variables. The
value of r ranges from – 1 to + 1, the degrees of correlation are the following:
84
±0.01- ±0.25 Low positive/negative correlation
0.0 No correlation
𝑛 ∑ 𝑥𝑦−(∑ 𝑥)(∑ 𝑦)
Formula for correlation (Pearson 𝑟) 𝑟 =
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ]
Example 1: A study was made to determine the relationship existing between the grades in
Trigonometry and Drafting 101. A random sample of 10 first year BSIT students of Carlos
Hilado Memorial State College, were taken and the following are the results of the sampling.
Students No: 1 2 3 4 5 6 7 8 9 10
Trigonometry (𝑥) 75 83 80 89 77 78 92 86 93 84
Drafting (𝑦) 78 87 78 92 76 81 89 89 91 84
Is the obtained relationship significant at 0.05 level?
Solution:
Step 1: Variable of interest: grades in Trigonometry and Drafting
𝐻0 : There is no significant relationship between the grades in Trigonometry and
Drafting 101. (𝜌 = 0)
𝐻𝑎 : There is a significant relationship between the grades in Trigonometry and
Drafting 101. (𝜌 ≠ 0)
Step 2: 𝛼 = 0.05
Step 3: Type of test: Two-tailed test (Why two tailed test?)
𝑛−2
Test statistic: Pearson Product Moment of Correlation Coefficient 𝑟. 𝑡 = 𝑟√1−𝑟 2
(Degree of freedom) df = 𝑛 − 2 = 10 – 2 = 8
Since, it is a two-tailed test, we will divide the level of significance by 2.
85
𝛼 0.05
Therefore, 2 = (Since we are testing for the possibility of the relationship in
2
both directions.)
= 0.025
𝛼
Using the t-distribution table, df = 8 and = 0.025, the critical value CV is ±2.31.
2
(Refer Appendix A)
Decision rule: Reject 𝐻0 if (|𝐶𝑉 | ≥ |±2.31|), otherwise, do not reject 𝐻0 .
So therefore,
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
10(71030)−(837)(845)
𝑟=
√[10(70413)−(837)2][10(71717)−(845)2
(710300)−(707265)
𝑟=
√[(704130)−(700569)][(717170)−(714025)
3035 3035
𝑟= = = 0.907 (High positive correlation)
√(3561)(3145) √11199345
𝑛−2 8
𝑡 = 𝑟√ t = 0.907√
1 − 𝑟2 0.177351
10−2 t = 6.09
𝑡 = 0.907√1−(0.907)2
86
Step 5: Decision: Since |6.09| > |±2.31|, reject the 𝐻0 .
Conclusion: Therefore, there was a significant relationship existing between the
grades in Trigonometry and Drafting 101 at 0.05 level.
Regression Analysis
This topic discusses the simplest type of prediction, that of predicting one variable (𝑦)
with the knowledge of another variable (𝑥). Prediction refers to the process of calculating
scores of the criterion variable (𝑦), on the basis of the knowledge of the predictor variable
(𝑥). The concept of prediction and correlation are closely related. A more accurate prediction
of 𝑦 could be made from 𝑥 if the correlation coefficient is of greater absolute value.
A simple technique for prediction is though linear regression analysis which utilizes an
equation of the form.
𝑦 = 𝑎 + 𝑏𝑥
Where: 𝑥 is the predictor variable (independent variable)
𝑦 is the criterion variable (dependent variable)
𝑎 is the 𝑦-intercept
𝑏 is the slope of the line
This is the equation of the line which is appropriated to the given data. This is called
the least square line or the simple regression line. In this method, 𝑦 is called the dependent
variable and 𝑥, the independent variable. The slope of the regression line for predicting 𝑦
from 𝑥 will be represented by 𝑏 and the point where the line intersects the 𝑦–axis or simply
the 𝑦–intercept is represented by 𝑎 and can be determined through the use of the following
formulas:
𝑛 ∑ 𝑥𝑦−(∑ 𝑥)(∑ 𝑦)
𝑏= 𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
𝑎 = 𝑦̅ − 𝑏𝑥̅
Students No. 1 2 3 4 5 6 7 8 9 10
Hours Spent (𝑋) 2.5 2.75 1.5 1.0 3.0 2.5 1.25 3.5 1.5 2.0
Achievement Grade (𝑌) 89 88 82 77 90 91 80 93 81 86
87
Solution:
Step 2: 𝛼 = 0.05
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2.5 89 222.5 6.25 7921
2.75 88 242 7.5625 7744
1.5 82 123 2.25 6724
1.0 77 77 1 5929
3.0 90 270 9 8100
2.5 91 227.5 6.25 8281
1.25 80 100 1.5625 6400
3.5 93 325.5 12.25 8649
1.5 81 121.5 2.25 6561
2.0 86 172 4 7396
∑ 𝑥 = 21.5 ∑ 𝑦 = 857 ∑ 𝑥𝑦 = 1881 ∑ 𝑥 2 = 52.375 ∑ 𝑦 2 = 73705
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
10(1881)−(21.5)(857)
𝑟=
√[10(52.375)−(21.5)2 ][10(73705)−(857)2
(18810)−(18425.5)
=
√[(523.75)−(462.25)][(737050)−(734449)
(384.5)
𝑟=
√(61.5)(2601)
88
384.5
=
√159961.5
= 0.961 (High positive correlation)
Then, let us determine the computed t-value.
8
𝑛−2 t= 0.961√0.076479
𝑡 = 𝑟√
1 − 𝑟2
𝑡 = 9.83
10 − 2
𝑡 = 0.961√
1 − (0.961)2
REGRESSION
To determine the equation of linear regression 𝑦 = 𝑎 + 𝑏𝑥
Where:
𝑦 is the achievement grade 𝑎 is the 𝑦-intercept
𝑥 is the number hours spent in studying 𝑏 is the slope of the line
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
2.5 89 6.25 7921 222.5
2.75 88 7.5625 7744 242
1.5 82 2.25 6724 123
1.0 77 1 5929 77
3.0 90 9 8100 270
2.5 91 6.25 8281 227.5
1.25 80 1.5625 6400 100
3.5 93 12.25 8649 325.5
1.5 81 2.25 6561 121.5
2.0 86 4 7396 172
∑ 𝑥 = 21.5 ∑ 𝑦 = 857 ∑ 𝑥 2 = 52.375 ∑ 𝑦 2 = 73705 ∑ 𝑥𝑦 = 1881
𝑋̅ = 2.15 𝑌̅ = 85.7
89
Let us solve the value of b.
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥 )(∑ 𝑦) 18810 − 18425.5
𝑏= =
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 523.750 − 462.25
10(1881) − (21.5)(857) 384.5
= =
10(52.375) − (21.5)2 61.5
= 6.25
So, 𝑏 = 6.25
For, 𝑦̅ and 𝑥̅ , we have
∑𝑦 ∑𝑥
𝑦̅ = 𝑥̅ =
𝑛 𝑛
857 21.5
= =
10 10
= 85.7 = 2.15
Therefore,
𝑎 = 𝑦̅ − 𝑏𝑥̅
= 85.7 − 6.25(2.15)
= 72.26
Hence, 𝑏 = 6.25 and 𝑎 = 72.26
Using 𝑦 = 𝑎 + 𝑏𝑥, we have the equation of linear regression is
𝑦 = 72.26 + 6.25𝑥.
Question: What is the predicted achievement grade of a student who spent 2.25 hours in
studying the subject?
Solution: Substitute 𝑥 = 2.25 in the equation of linear regression and solve for 𝑦.
𝑦 = 72.26 + 6.25𝑥
= 72.26 + 6.25 (2.25)
= 86.32
Therefore, a student who spent 2.25 hours in studying the subject has a predicted
achievement grade of 86.32.
Take note: To solve problems involving correlation and regression, we have the following
steps.
Step 1: Using 5 steps of hypothesis testing, solve for the degree r to determine the
strength of relationship between the two variables (see the formula for Pearson r) and
determine whether there is a significant relationship between two variables by solving
the computed t-value (see the formula for t).
Step 2: If there is a significant relationship exists between the two variables, then you
proceed to regression analysis, otherwise, do not proceed to regression analysis
anymore.
Step 3: For regression analysis, solve first for the mean of 𝑦 (𝑦̅) and the mean of 𝑥 (𝑥̅ )
Step 4: Solve for the value of b (slope of the line) and a (y-intercept).
Step 5: Determine the equation y = a + bx. Remember that y is the dependent variable
and x is the independent variable. In our example, the achievement grade (𝑦) is
90
dependent on the number of hours spent in studying (x). Using the simple regression
line, we can predict the dependent variable y (which is the achievement grade in our
example) using the independent variable x (which is the number of hours spent in
studying in our example) and the equation y = a + bx.
Challenge Yourself!
1. The following table shows the amount of converted sugar in a chemical process at
different temperatures.
91
Gauge Your Learning!
Solve the following problems and show your complete solution.
Verbal 𝑥 Math 𝑦
1. SAT (Scholastic Ability Test) 95 87
Educational researchers desired to find out if a 89 88
relationship exists between the average SAT verbal
76 73
score and the average SAT mathematical score. There
65 60
were ten randomly selected, and their SAT average
72 75
scores are recorded below. Is there sufficient evidence
80 69
to conclude a relationship between the two scores at
73 76
0.05 level?
71 78
66 62
90 87
92