Learning Module 5 Hypothesis Testing - WITH ANSWER
Learning Module 5 Hypothesis Testing - WITH ANSWER
Learning Module 5 Hypothesis Testing - WITH ANSWER
Inc.
Cagayan de Oro City
Session Topic Learning Outcome Learning Activities Resources 4 PRONGED Expected Output /
and Media INTEGRATION Assessment
Testing Hypothesis . Teacher’s Core/
- What is Hypothesis Testing 1. Make decisions based on Given a situation ask Do activities in the prepared RelatedValues
9 to 10
th th
- Procedure for Testing a evidence, moral norms and students to formulate null module Module Table of Area Under
week Hypothesis imperatives and alternate hypothesis, Accuracy the Normal Curve
- One-Tailed and Two-Tailed and practice discernment before test the hypothesis, Collaborative activity Neatness
Tests of Significance making decisions. (GEC, ILO1) justify decision to reject with research group Calculator
- Testing for a Population Mean 2. show critical and analytical or fail to reject the null ICV:
with a Known Population thinking skills and creativity in hypothesis and formulate Practice work Excellence
Standard Deviation analyzing test results and conclusion. Chalk and board
-Testing for a Population Mean: identifying its implications and Sharing of output RV:
Population Standard applications in real life (ILO 2, Competence
Deviation Unknown GEC) Self Reliance
-Tests Concerning Proportions
-Correlation
- Regression
2
If Time Permit:
Chi Square Test of
Independence
Chapter 5
Hypothesis Testing
[Type text]
3
Introduction
Researchers are interested in answering many types of questions. For example, a teacher might want to know whether the new teaching technique is better than
the other. A hotel manager might want to know if the public would patronize a hotel that offers free transportation around the city. A nurse might like to know if
teaching the patient to independently take care of himself will decrease length of hospital stay. These types of questions can be addressed through statistical
hypothesis testing.
In this chapter will conduct a test of hypothesis regarding the validity of a statement about a population parameter.
What is a Hypothesis?
Hypothesis: A conjecture or statement about a population parameter developed for the purpose of testing.
In statistical analysis we make a claim, that is, state a hypothesis, and then follow up with tests to verify the assertion or to determine that it is untrue.
A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. Hypothesis testing refers to the formal
procedures used by statisticians to accept or reject statistical hypotheses.
Hypothesis testing starts with a statement about a population parameter such as the mean.
Hypothesis testing: A procedure based on sample evidence and probability theory to determine whether the hypothesis is a
reasonable statement.
For example, one statement about the performance of a new model car is that the mean miles per gallon is 30. The other statement is that the mean miles per
gallon is not 30. Only one of these statements is correct.
[Type text]
4
The first step is to state the hypothesis being tested. It is called the null hypothesis, designated H0, and read H sub zero. The capital letter H stands for
hypothesis, and the subscript zero implies “no difference.”
Null hypothesis: A statistical hypothesis that states that there is no difference between a parameter and a specific value or
that there is no difference between two parameters.
For example, a researcher made the claim that the mean length of a hotel stay was 2.5 days. You think that the true length of stay is some other length than 2.5
days.
The null hypothesis is written H0: μ = 2.5, where H0 is an abbreviation of the null hypothesis. The null hypothesis will always contain the equal sign. It is the
statement about the value of the population parameter, in this case the population mean. The null hypothesis is established for the purpose of testing. On the basis
of the sample evidence, it is either rejected or not rejected.
Alternate hypothesis: A statistical hypothesis that states a specific difference between a parameter and a specific value or states that
there is a difference between two parameters.
[Type text]
5
The alternate hypothesis is written H1. From the above example the alternate hypothesis is that the mean length of stay is not 2.5 days. It is written H1: μ ≠ 2.5 (≠ is
read “not equal to”). H1 is accepted only if H0 is rejected. When the “≠” sign appears in the alternate hypothesis, the test is called a two-tailed test.
In research, a hypothesis states your predictions about what your will find in your study. It contains two or more variables that are measurable or potentially
measurable and that they specify how the variables are related. It includes the specific group being studied and proposes an expected result. Below are some
examples:
Example A:
Ho: There is no significant difference in the level of customers willingness to buy in Store A when grouped according to economic level.
H1: Customers belonging to higher income level have higher level of willingness to buy in Store A than those who belong to lower income
level.
Ho: There is no significant relationship between customers cultural beliefs and their food preference.
H1: Customers cultural beliefs influence their food preference.
Ho: Beliefs on presidential candidate’s character traits do not influence voter’s preference.
H1: Beliefs on presidential candidate’s character traits influence voter’s preference.
Activity 1. Study the data that you have gathered. Write three null hypotheses on comparison of means and there null hypotheses on relationships of
variables you have included in your study.
Answer:
Ho: There is no significant relationship between grit and empathy to a higher level of work commitment
H1: Grit and empathy influence the level of work commitment.
[Type text]
6
When we arrive at Step 5, we are ready to either accept or reject the null hypothesis. You should be aware that hypothesis testing as used by statisticians does not
provide proof that something is true in the manner that a mathematician proves a statement. However, in cases where the null hypothesis is rejected, it does
provide “proof beyond a reasonable doubt” that the null hypothesis is not true. The steps involved in hypothesis testing will now be described in more detail.
First we will concentrate on testing a hypothesis about a population mean. Then we will consider hypothesis testing for a population proportion. For a mean:
Step 1. State the null hypothesis (H0) and the alternate hypothesis (H1).
After setting up the null hypothesis and alternate hypothesis, the next step is to state the level of significance.
Level of significance: The maximum probability of rejecting the null hypothesis when it is true.
[Type text]
7
The level of significance is designated α, the Greek letter alpha. The level of significance is sometimes called the level of risk. It will indicate when the sample
mean is too far away from the hypothesized mean for the null hypothesis to be true. Usually the significance level is set at either 0.01 or 0.05, although other
values may be chosen.
Testing a null hypothesis at the 0.05 significance level, for example, indicates that the probability of rejecting the null hypothesis, even though it is true, is 0.05. The
0.05 level is also stated as the 5% level. When a true null hypothesis is rejected, it is referred to as a Type I error.
The decision whether to use the 0.01 or the 0.05 significance level, or some other value, depends on the consequences of making a Type I error. The significance
level is chosen before the sample is selected.
If the null hypothesis is not true, but our sample results indicate that it is, we have a Type II error.
For example, H0 is that the mean hospital stay is 2.5 days. Our sample evidence fails to refute this hypothesis, but actually the population mean length of stay is
4.0 days. In this situation we have committed a Type II error by accepting a false H0.
We refer to the probability of these two possible errors as alpha α and beta β. Alpha (α) is the probability
of making a Type I error and beta (β) is the probability of making a Type II error. The table on the right Researcher
summarizes the decisions the researcher could make and the possible consequences. Null
Accepts H0 Rejects H0
Hypothesis
H0 is true Correct Type I error
decision
[Type text]
8
Correct
H0 is false Type II error
decision
A test statistic is a quantity calculated from the sample information and is used as the basis for deciding whether or not to reject the null hypothesis.
Test statistic: A value, determined from sample information, used to determine whether to reject the null hypothesis.
Exactly which test statistic to employ is determined by factors such as whether the population standard deviation is known.
In hypothesis testing for the mean μ, when is known the test statistic z is computed by Formula:
Testing a Mean, σ
Known
where:
[Type text]
9
A decision rule is based on H0 and H1, the level of significance, and the test statistic.
Decision rule: A statement of the conditions under which the null hypothesis is rejected and conditions under which it is not rejected.
The region or area of rejection indicates the location of the values that are so large or so small that the probability of their occurrence for a true null hypothesis is
rather remote.
If we are applying a one-tailed test, there is one critical value. If we are applying a two-tailed test, there are two critical values.
Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.
Chart 5.1 shows the conditions under which the null hypothesis is rejected, using the 0.05 significance level, a one-tailed test, and the standard normal distribution.
[Type text]
10
1. The area where the null hypothesis is not rejected is to the left of 1.65.
2. The area of rejection is to the right of 1.65.
3. A one-tailed test is being applied.
4. The 0.05 level of significance was chosen.
5. The sampling distribution of the test statistic z is normally distributed.
6. The value 1.65 separates the regions where the null hypothesis is rejected and where it is not rejected.
7. The value 1.65 is called the critical value.
When is the standard normal distribution used? It is appropriate when the population is normal and the population standard deviation is known.
If the computed value of z is greater than 1.65, the null hypothesis is rejected. If the computed value of z is less than or equal to 1.65, the null hypothesis is not
rejected.
Step 5. Compute the value of the test statistic, make a decision, and interpret the results.
The final step in hypothesis testing after selecting the sample is to compute the value of the test statistic. This value is compared to the critical value, or values,
and a decision is made whether to reject or not to reject the null hypothesis. Interpret the results.
We need to differentiate between a one-tailed test of significance and a two-tailed test of significance.
Chart 5.1 above depicts a one-tailed test. The region of rejection is only in the right (upper) tail of the curve.
Chart 5.2 depicts a situation where the rejection region is in the left (lower) tail of the normal distribution.
[Type text]
11
The Chart 5.3 depicts a situation for a two-tailed test where the rejection region is divided equally into the two tails of the normal distribution.
Chart 5.3 Regions of Non-rejection and Rejection for a Two-Tailed Test, 0.05 Level of Significance
Example:
Suppose the researcher choose = 0.01 to test a hypothesis. What is the critical z value?
[Type text]
12
Using the table for Standard Normal Distribution, the closest value to 0.01 is .0099. This gives a critical value of value of -2.33.
[Type text]
13
For right -tailed test : The critical value falls to the right of the mean. Thus, the critical value is 2.33.
The critical region must be split into two equal parts. If = 0.01, then half of the area , or 0.005, must be to the left of the mean.
[Type text]
14
The closest value to 0.0050 is 0.0049. Thus the critical values are -2.58 and 2.58.
https://www.socscistatistics.com/tests/criticalvalues/default.aspx
Activity 2. Find the critical values for each situation and draw the appropriate figure.
Suppose we are concerned with a single population mean. We want to test if our sample mean could have been obtained from a population with a particular
hypothesized mean. For example, we may be interested in testing whether the mean starting salary of recent social work graduates is equal to P96,000 per year. It
is assumed that:
[Type text]
15
Under these conditions the test statistic is the standard normal distribution with the sample standard deviation s substituted for . Thus we use text formula [5-1].
where:
[Type text]
16
A researcher reports that the average monthly salary of nurses is more than P22,000.00. A sample of 30 nurses has a mean salary of P23,200.00. At =
0.05, test the claim that nurses earn more than P22,000 a month. The = P2,500.00.
Under these conditions the test statistic is the standard normal distribution with the sample standard deviation s substituted for .
Thus we use text formula [5-1].
where:
[Type text]
17
Step 4. Find the area of rejection by finding the critical value. At = 0.05, z critical = 1.65.
Step 5. Make a decision regarding the null hypothesis based on the sample information. Interpret the results of the test. Draw your conclusion.
Since computed value of z = 2.19 greater than the critical value of 1.65, then the null hypothesis is rejected. It can be concluded that the difference in the
salaries is significant
[Type text]
18
Try This
1. A researcher reports that the average salary of assistant professors is more than P42,000.00. A sample of 30 assistant professors has a mean salary of
P43,260.00. At =0.05, test the claim that assistant professors earn more than P42,000 an year. The = P5230.00
Solution:
where:
z is the value of the test statistic.
is the sample mean.
μ is the population mean.
is the standard deviation of population.
n is the number in sample.
[Type text]
19
2. A national magazine claims that the average college student watches less television than the general public. The national average is 29.4 hours per week,
with =2 hours. Is there enough evidence to support the claim at =.01?
Solution :
Hypotheses:
H0: µ ≥ 29.4
H1: µ< 29.4 (claim)
Since there is no population mean, and number of population, therefore in the substitution of the formula
[Type text]
20
3. The Medical Foundation reports that the average cost of rehabilitation for stroke victim is P24,672. The average cost of rehabilitation of a sample of 35
victims is 25,226. =P3,251. At =0.1, can it be concluded that the average cost at a large hospital is different from P24,672?
Solution:
Hypotheses.
H_{0}H0: µ = 24672
H_{1}H1:µ ≠ 24672
Here α = 0.01.
Since α = 0.01 and the test is a two-tailed test, the critical value is Z_\alpha = 2.58 Zα=2.58
Since the test value, 1.01 is less than 2.58, it doesn’t falls in the critical region, which is /Z/ > Z\alpha
2
[Type text]
21
In the process of testing a hypothesis, we compared the test statistic to a critical value. We made a decision to either reject the null hypothesis or not to reject it.
The question is often asked as to how confident we were in rejecting the null hypothesis.
A p-value is frequently compared to the significance level to evaluate the decision regarding the null hypothesis. It is a means of reporting the likelihood that H0 is
true.
p-value: The probability of observing a sample value as extreme as, or more extreme than, the value observed,
given that the null hypothesis is true.
● If the p-value is greater than the significance level, then H0 is not rejected.
● If the p-value is less than the significance level, then H0 is rejected.
● The p-value for a given test depends on three factors:
1. whether the alternate hypothesis is one-tailed or two-tailed
2. the particular test statistic that is used
3. the computed value of the test statistic
For example, if α = 0.05 and the p-value is 0.0025, H0 is rejected. We report there is only a 0.0025 likelihood that H0 is true.
The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null
hypothesis were true, a sample result this extreme would occur only 2% of the time.
You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the
probability of obtaining the sample result if the null hypothesis were true.
[Type text]
22
In most cases the population standard deviation is unknown. Thus, σ must be based on prior studies or estimated by the sample standard deviation, s. In cases
where we are using s in place of σ, Formula [5-1] is modified as follows:
Testing a Mean, σ
Unknown
[Type text]
23
https://www.socscistatistics.com/tests/criticalvalues/default.aspx
Example
Is the temperature required to damage a computer on the average less than 110 degrees? Because of the price of testing, twenty computers were tested to see
what minimum temperature will damage the computer. The damaging temperature averaged 109 degrees with a standard deviation of 3 degrees.
[Type text]
24
This is a one tailed test, so we can go to our t-table with 19 degrees of freedom to find that
Since
We see that the test statistic does not fall in the critical region. We fail to reject the null hypothesis and conclude that there is insufficient evidence to
suggest that the temperature required to damage a computer on the average less than 110 degrees.
What is a proportion?
Proportion: The fraction, ratio, or percent indicating the part of the population or sample having a particular trait of interest.
[Type text]
25
If we let p stand for the sample proportion then text formula [5-3] is:
where:
where:
[Type text]
26
There are three formats for testing a hypothesis about a proportion. For a one-tailed test there are two possibilities, depending on the intent of the researcher. For
example, if we wanted to determine whether more than 25 percent of the sales of homes were sold to first time buyers, the hypotheses would be given as follows:
If we wanted to find out whether fewer than 25 percent of the homes were sold to first time buyers, the hypotheses would be given as:
Where ≠ means “not equal to.” Rejection of H0 and acceptance of H1 allows us to conclude only that the population proportion is “different from” or “not equal to” the
population value. It does not allow us to make any statement about the direction of the difference.
[Type text]
27
Example
1500 randomly selected coconut trees were tested for traces of the bark beetle infestation. It was found that 153 of the trees showed such traces.
Test the hypothesis that more than 10% of the coconut trees in Cagayan de Oro have been infested. (Use a 5% level of significance)
Solution
The hypothesis is
We have that
Since we are using a 95% level of significance with a one tailed test, we have zc = 1.645. The rejection region is shown in the picture. We see that 0.26 does
not lie in the rejection region, hence we fail to reject the null hypothesis. We say that there is insufficient evidence to make a conclusion about the percentage of
infested coconut trees being greater than 10%.
[Type text]
28
Try This
1. A mobile phone company claim that their products have a mean life span of 5 years with a standard deviation of 3 years. Test the null hypothesis that µ = 5
years against the alternative hypothesis that µ ≠ 5 years if a random sample of 50 phones was tested and found to have a mean life span of only 4 years.
Use a 0.05 level of significance.
Solution :
Hypothesis:
Ho: µ = 5
H1: µ ≠ 5
Given :
STD= 3
Mean= 5
Since this is a two-tailed test, split the alpha into two.
0.05/2=0.025
Find the z-score associated with your alpha level. You’re looking for the area in one tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you would
also be considering the left tail (z = 1.96)
Find the test statistic using this formula:z score formula
z = ( 4– 5) / (3/√50) = (-1)/3(7.07)= -2.35
Since it is less than -1.96 or greater than 1.96, reject the null hypothesis. In this case, it is less than, so you can accept the null.
2. The average length of time for a doctor to diagnose a throat illness using an old procedure is 30 minutes. Using a computerization method, a random sample
of 30 patients was used and found to have a mean length of 18 minutes with the standard deviation of 1.5 minutes. Test the significance of the difference
between the population mean and the sample mean at 0.01 level of significance.
Hypothesis:
H0:μ=18
[Type text]
29
HA:μ ≠ 18
Now we will find the probability of observing a test statistic at least this extreme when assuming the null hypothesis. Since our alternative hypothesis is that the
mean is greater, we want to find the probability of z scores that are greater than our test statistics. The p-value we are looking for is:
p-value=P(z>0.17)=1−P(z<0.17)
p-value=P(z>0.0.167)=1−P(z<0.167)=1−0.6064=0.3936>0.01
The probability of observing a test statistic at least as big as the z=0.17 is 0.3936. Since this is greater than our significance level, 0.01, we fail to reject the null
hypothesis. This means that the data does not support the claim that the mean is greater than 65.
3. If 80% of the nation are catholics does the Misamis Oriental environment reflect the national proportion? Test the hypothesis that Misamis Oriental
residents differ from the rest of the nation in their religion , if of 200 locals surveyed, 115 are catholics.
Solution:
[Type text]
30
Z= (0.575-0.8) = (0.575-0.8)
√0.8(1-0.80)/200 0.0008
Sample:
A Manager claims that female employees have higher number of days missed than the male employees. The data below shows the number of days
missed by 40 employees last year. Is there a sufficient evidence to believe the manager’s statement at α = 0.05?
Male 1 3 5 7 2 4 2 1 6 2 3 4 6 2 8 10 6 12 14 4
Employee 3 2 5 4 7 8 2 1 2 1 4 5 8 6 9 12 9 10 3 6
s
Female 4 2 4 6 8 3 9 10 5 6 3 6 5 8 9 3 11 8 2 7
Employee 14 6 5 4 6 7 8 3 5 7 2 5 8 9 11 7 9 0 12 11
s
Solution:
[Type text]
31
Male Female
1 4
3 2
5 4
7 6
2 8
4 3
2 9
1 10
6 5
2 6
3 3
4 6
6 5
2 8
8 9
[Type text]
32
10 3
6 11
12 8
14 2
4 7
3 14
2 6
5 5
4 4
7 6
8 7
2 8
1 3
2 5
1 7
4 2
5 5
8 8
6 9
9 11
12 7
9 9
10 0
3 12
6 11
6 11
[Type text]
33
2. First, perform an F-Test to determine if the variances of the two populations are equal.
Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak add-in.
[Type text]
34
Male Female
Mean 5.225 6.45
Variance 11.46089744 9.7410256
Observations 40 40
df 39 39
F 1.176559621
P(F<=f) one-tail 0.307066699
F Critical one-tail 1.704465067
[Type text]
35
3. Since F < F critical, select t-Test: Two-Sample Assuming Unequal Variances and click OK.
8. Click OK.
Result:
[Type text]
36
Male Female
Observations 40 40
Hypothesized Mean 0
Difference
df 78
t Stat -
1.682590595
P(T<=t) one-tail 0.048226781
[Type text]
37
Conclusion: We do a one -tail test (inequality). lf t Stat < -t Critical one -tail or t Stat > t Critical one-tail, we reject the null hypothesis. Since 1.68259 >1.664885, we reject the null hypothesis . The
observed difference between the sample means (6.45 (-5.225) is convincing enough to say that the average number of missed days between female and male employees differ significantly.
Acess: https://www.excel-easy.com/examples/t-test.html
Try This
Activity 5
A researcher claims that male students are better in mathematics than female students. He gathered the results of the recent Math Achievement test.
The data below shows the scores of the students. Is there a sufficient evidence to believe the researcher’s statement at α = 0.05?
Male 30 30 15 17 12 40 27 31 36 32 33 41 26 22 18 10 16 12 14 14
Students 31 28 25 34 37 28 22 41 32 19 42 28 38 16 9 12 9 10 23 16
Female 14 22 41 36 18 34 29 16 35 36 31 26 15 18 19 23 31 18 27 37
Students 14 36 25 34 46 17 28 33 35 27 42 35 38 29 19 27 39 40 12 11
Take the screenshots of the results. What would be the correct decision? Why? What is your conclusion?
Answer:
[Type text]
38
In the process of testing a hypothesis, we compared the test statistic to a critical value. We made a decision to either reject the null hypothesis or not to reject it.
The question is often asked as to how confident we were in rejecting the null hypothesis. Another tool that can be used in making decision is the p-value.
A p-value is frequently compared to the significance level to evaluate the decision regarding the null hypothesis. It is a means of reporting the likelihood that H0 is
true.
p-value: The probability of observing a sample value as extreme as, or more extreme than, the value observed,
given that the null hypothesis is true.
● If the p-value is greater than the significance level, then H0 is not rejected.
● If the p-value is less than the significance level, then H0 is rejected.
● The p-value for a given test depends on three factors:
1. whether the alternate hypothesis is one-tailed or two-tailed
2. the particular test statistic that is used
3. the computed value of the test statistic
For example, if α = 0.05 and the p-value is 0.0025, H0 is rejected. We report there is only a 0.0025 likelihood that H0 is true.
[Type text]
39
The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null
hypothesis were true, a sample result this extreme would occur only 2% of the time.
You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the
probability of obtaining the sample result if the null hypothesis were true.
Look at This
Male Female
Mean 5.225 6.45
Variance 11.4609 9.741026
Observations 40 40
Hypothesized Mean Difference 0
df 77
t Stat -1.68259
P(T<=t) one-tail 0.048253
t Critical one-tail 1.664885
P(T<=t) two-tail 0.096505
t Critical two-tail 1.991254
At α= 0.05, P(T<=t) one-tail = 0.048253which is less than 0.05. Thus, null hypothesis is rejected.
[Type text]
40
Question: Did the decision change? What is the basis for your decision?
Note that we use z t-test if we are comparing only two means. What if we have more than two?
Look at This
Researcher would like to know if there is a significant difference in the mean scores of students taking Mathematics using different modes of learning: Distance
Learning: Distance Learning, Online Learning, and Television-based Learning. Results of the Mathematics Achievement Test is shown below. Is there a sufficient
evidence to believe that the mean scores are equal at α = 0.05?
Scores of Students
Distance 12 23 34 25 31 36 31 27 17 16 10 24 36 37 43 32 45 33 21 15 37 33 35 38 20 21 27 26 23 29
Learnin
g
Online 34 27 33 30 40 29 45 41 36 34 32 26 17 19 10 24 23 43 41 23 34 35 23 38 40 34 27 43 18 23
Leaning
Flexible 43 12 34 30 28 21 19 15 23 24 36 37 43 35 32 33 41 41 37 38 40 42 44 46 25 23 24 34 28 24
Learnin
g
Solution:
Step 1: Type your data into three columns.
.
[Type text]
41
Distance
Online Flexible
Learnin
Leaning Learning
g
12 34 43
23 27 12
34 33 34
25 30 30
31 40 28
36 29 21
31 45 19
27 41 15
17 36 23
16 34 24
10 32 36
24 26 37
36 17 43
37 19 35
43 10 32
32 24 33
45 23 41
[Type text]
42
33 43 41
21 41 37
15 23 38
37 34 40
33 35 42
36 23 44
38 38 46
20 40 25
21 34 23
27 27 24
26 43 34
23 18 28
29 23 24
Step 2: Click the “Data” tab and then click “Data Analysis.”
[Type text]
43
Step 4: Type an input range into the Input Range box. Check the “Labels in first row” if you have column headers, and select the Rows radio button if your data is
in rows.
[Type text]
44
SUMMARY
Groups Count Sum Average Variance
Distance Learning 30 920 30.66667 101.954
Online Leaning 30 922 30.73333 79.02989
Flexible Learning 30 952 31.73333 84.27126
ANOVA
Source of SS df MS F P-value F crit
Variation
Between Groups 21.42222 2 10.71111 0.121141 0.886058 3.101296
Within Groups 7692.4 87 88.41839
Total 7713.822 89
The Summary table indicates that the mean scores range from a low of 30.67 for distance learning to a high of 31.73 for flexible learning. Our sample means are
different. However, the differences we see in our samples might be the result of random sampling error.
[Type text]
45
In the ANOVA table, the p-value is 0.886058. Because this value is greater than our significance level of 0.05, we fail to reject the null hypothesis. Our sample data
does not provide strong enough evidence to conclude that the three mean scores are not equal. Thus we conclude there is no significant difference in the mean
scores of students learning Mathematics in different modes.
Read:
1. What is ANOVA
https://www.qualtrics.com/au/experience-management/research/anova/
In many instance, a researcher would like to investigate on relationship between variables. They are interested in finding relationship such as gender and time-off,
base salary and additional compensation, tuition and retirement benefits, and time-off and vehicle allowance. What statistical tools can be used to answer
questions related to relationship.
[Type text]
46
To study the relationship between two variables we use two techniques: correlation analysis and regression analysis.
Correlation analysis: A group of techniques to measure the association between two variables.
The purpose of correlation analysis is to find the relationship between two variables. For example, height and weight are related; taller people tend to be
heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people
you know where the shorter one is heavier than the taller one. Correlation can tell you just how much of the variation in peoples' weights is
related to their heights.
Education and years in jail—people who have more years of education tend to have fewer years in jail.
There are several different correlation techniques. Some of these are Pearson or product-moment correlation and partial correlation. The latter is useful when you
want to look at the relationship between two variables while removing the effect of one or two other variables.
Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable data in which numbers are meaningful,
usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color.
Rating Scales
Most statisticians say you cannot use correlations with rating scales, because the mathematics of the technique assume the differences between numbers are
exactly equal. Nevertheless, many survey researchers do use correlations with rating scales, because the results usually reflect the real world. Some however
thinks that correlations can be used with rating scales. However, one should note that when working with quantities, correlations provide precise measurements.
When working with rating scales, correlations provide general indications.
Go To :
[Type text]
47
1. What is correlation?
https://chrome.google.com/webstore/detail/dualless/bgdpkilkheacbboffppjgceiplijhfpd
Look at This
The Promotion Chief claimed that the higher the cost of sales promotion , the higher is the volume of sales. To test his claim data for 30 businesses
were gathered. Data is shown below. Test the claim of the Promotion Manager at α= 0.05.
Null hypothesis: There is no significant relationship between the cost of promotion and the volume of sales.
Solution:
[Type text]
48
[Type text]
49
Expenditur Vulume
e in of Sale in
Thousand Thousan
of Pesos d of
Pesos
Expenditure in Thousand of Pesos 1
Volume of Sale in Thousand of Pesos 0.63 1
o
The correlation test that we used here is called Pearson’s correlation coefficient.
[Type text]
50
Pearson correlation (r) is used to measure strength and direction of a linear relationship between two variables. Mathematically this can be done by dividing the
covariance of the two variables by the product of their standard deviations.
A measure of the linear (straight-line) strength of the relationship between two sets of interval-scaled or ratio-scaled variables is given by the coefficient of
correlation. The coefficient of correlation is also called Pearson's product moment correlation coefficient or Pearson's r after its founder Karl Pearson.
A measure of the linear (straight-line) strength of the relationship between two sets of interval-scaled or ratio-scaled variables is given by the coefficient of
correlation. The coefficient of correlation is also called Pearson's product moment correlation coefficient or Pearson's r after its founder Karl Pearson.
[Type text]
51
• Ranges from –1 to + 1 tells you how strongly two variables are related to each other.
[Type text]
52
Source
Correlation Coefficients: Appropriate Use and Interpretation
Anesthesia & Analgesia126(5):1763-1768, May 2018.
Strength: The greater the absolute value of the correlation coefficient, the stronger the relationship.
o The extreme values of -1 and 1 indicate a perfectly linear relationship where a change in one variable is accompanied by a perfectly consistent change
in the other. For these relationships, all of the data points fall on a line. In practice, you won’t see either type of perfect relationship.
o A coefficient of zero represents no linear relationship. As one variable increases, there is no tendency in the other variable to either increase or
decrease.
o When the value is in-between 0 and +1/-1, there is a relationship, but the points don’t all fall on a line. As r approaches -1 or 1, the strength of the
relationship increases and the data points tend to fall closer to a line.
o Direction: The sign of the correlation coefficient represents the direction of the relationship.
o Positive coefficients indicate that when the value of one variable increases, the value of the other variable also tends to increase. Positive relationships
produce an upward slope on a scatterplot.
o Negative coefficients represent cases when the value of one variable increases, the value of the other variable tends to decrease. Negative
relationships produce a downward slope.
[Type text]
53
Expenditur Volume
e in of Sale in
Thousand Thousan
of Pesos d of
Pesos
Expenditure in Thousand of Pesos 1
Volume of Sale in Thousand of Pesos 0.63 1
o
Thus, going back to the results, in terms of strength, the cost of promotion and the volume of sale have positive and strong relationship.
Expenditur Volume
e in of Sale in
Thousand Thousan
of Pesos d of
Pesos
Expenditure in Thousand of Pesos 1
Volume of Sale in Thousand of Pesos 0.63 1
o
[Type text]
54
Is this relationship significant? To answer the question , convert r to t using the formula:
Using the t-table, critical value of t at α= 0.05, one-tailed test, df = n-2 =30-2, is 1.701. Since the computed value of t is greater than the critical value, the null
hypothesis is rejected. Thus, there is a significant positive and strong relationship between the cost of promotion and the volume of sale.
[Type text]
55
[Type text]
56
qe3_A:1595906404709&tbm=isch&source=iu&ictx=1&fir=LLdxz4VWi6tShM%252CIEmQHZV6f5tbtM%252C_&vet=1&usg=AI4_-kSj2Z-
K4zLKvE8kHn0q_pHxrL4THQ&sa=X&ved=2ahUKEwiPsuPS_u7qAhVGIIgKHd02DDgQ_h0wAXoECAoQBg&biw=1048&bih=588#imgrc=LLdxz4VWi6tShM
Study This
Given the correlation results below, interpret the results in terms of strength and direction. Then test for its significance.
1.
Length of Number
Service of
Absences
Length of Service (years) 1
Number of Absences -0.43 1
2. The correlation between number of employees laid off and the level of satisfaction of the remaining customer is -0.52.
[Type text]
57
In many instance it is not enough the we know there is a significant relationship between or among variables. Business analyst would like to understand how the
typical value of the dependent variable changes when one of the independent variables is varied, while the other independent variables are held fixed. The
regression analysis helps the organizations to make sense of the data and use them to make strategic business decisions.
Look at This.
The Promotion Chief claimed that the higher the cost of sales promotion, the higher is the volume of sales. To test his claim data for 30 businesses were gathered.
Data is shown below. Test the claim of the Promotion Manager at α= 0.05.
Suppose that in this example the researcher further ask , What volume of sale would be expected if the cost of promotion is 150?
What linear regression equation can best predict volume of sale based on the cost of promotion?
Regression analysis is a way of mathematically sorting out which of those variables does indeed have an impact. It answers the questions: Which factors matter
most? Which can we ignore? How do those factors interact with each other? And, perhaps most importantly, how certain are we about all of these factors?
Regression analysis allows us to find an equation that shows the linear (straight line) relationship between two variables. The equation is used to estimate Y based
on X and is referred to as the regression equation.
Regression Equation: An equation that expresses the linear relationship between two variables.
The linear relationship between two variables is given by the general form of the regression equation.
[Type text]
58
where:
(read Y hat) is the estimated value of the Y variable for a selected X value.
a is the Y intercept. It is the estimated value of Y when X = 0.
b is the slope of the line. It measures the change in For each unit change in X. It will always have the same sign as the coefficient of
correlation.
X is any value of the independent variable that is selected.
[Type text]
59
4. Select the X Range. These are the explanatory variables (also called independent variables). These columns must be adjacent to each other.
5. Check Labels.
7. Check Residuals.
[Type text]
60
2.
[Type text]
61
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.63365453
R Square 0.40151806
Adjusted R Square 0.38014371
Standard Error 11.5979778
Observations 30
ANOVA
df SS MS F Significance
F
Regression 1 2526.833485 2526.833 18.785038 0.00017054
5 3
Residual 28 3766.366515 134.5130
9
Total 29 6293.2
CoefficientsStandard t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error
Intercept 22.4722624 5.740918609 3.914401 0.0005284 10.7125237 34.23200108 10.71252373 34.23200108
8 3
Cost of Promotion in 0.45230256 0.104357342 4.334171 0.0001705 0.23853623 0.666068885 0.238536236 0.666068885
Thousand of Pesos 6
1. Multiple R. The correlation coefficient is 0.63365. It means the linear relationship is positive and strong.
2. R Squared (r2) is 0.4015. This is the coefficient of determination. It implies that 40% of the variation in Volume of Sale is explained by the independent
variables Cost of Sale Promotion.
[Type text]
62
4. Coefficients
The regression line is: y = Volume of Sales = 22.47+ 0. 4530 Cost of Promotion. In other words, for each unit increase in the cost of promotion, the volume of sales
increases by 0.4530 units.
TRY THIS
Activity 6. Interpret the following results.
1. Null Hypothesis: Number of years in school does not impact crime rate.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.73000000
R Square 0.53290000
Adjusted R Square 0.5014371
Standard Error 10.5979778
Observations 40
ANOVA
df SS MS F Significance
F
Regression 1 2526.833485 2526.8335 12.78503 0.005000
8
Residual 28 3766.366515 134.51309
Total 29 6293.2
[Type text]
63
SUMMARY OUTPUT
[Type text]
64
Regression Statistics
Multiple R -0.5210000
R Square 0.2714410
Adjusted R Square 0.1801431
Standard Error 5.5979778
Observations 50
ANOVA
df SS MS F Significance
F
Regression 1 256.833485 256.8335 10.78503 0.004300
8
Residual 28 3766.366515 134.51309
Total 48 6293.2
[Type text]
65
Performance Task
Study the data thatyou have gathered. Finalize and test your hypotheses. Describe your findings, generate conclusions and recommendations. Prepare your write
up and be ready to share your research findings to the class during our synchronous session. You will be graded according to the following criteria: Research
Questions and Hypotheses, Introduction, Design, Organization, Quality , presentation and written report of your Work.
[Type text]
66
Reflection Time:
[Type text]
67
4. In what way can you as a graduate student contribute to generation of new knowledge that that is useful in adjusting to the new normal brought about by the pandemic?
5. Write one question that you believe research can answer during this pandemic?
References:
B.5 References:
Anderson, D. R. (2011). Statistics for business and economics. 11th Edition. Australia: South-Western.
Bluman, Allan G.: Elementary Statistics, Third Edition, Mc Graw Hill, C2006
Berenson, Mark L.; Levine, David M. & Krehnbiel, Timothy G.: Basic Business Statistics, Tenth Edition, Pearson Prentice Hall, C.2006 Milton,
Susan J. & Arnold, Jessie C.: Introduction to Probability and Statistics, Fourth Edition, Mc Graw Hill
Lambros, Ann; Mathematics, Application and Concepts, Student Edition, Mc Graw Hill Glencoe, New York, 2003
Brase, C. H. (2012). Understandable Statistics: concepts and methods. 10th Edition. Australia: Brooks/Cole.
Fernandes, M. (2009). Statistics for Business and Economics. Place of publication not identified: Ventus Publishing.
Healey, J. F. (2012). Statistics: a tool for social research. Ninth Edition. Australia: Wadsworth.
Johnson, R. (2012). Elementary statistics. Eleventh Edition. Australia: Brooks/Cole.
[Type text]