STATPROB 4TH (FINALS)
1.1 Hypothesis Testing on Two STEP 1: Construct the null hypothesis
Population Means with Independent
Samples Ho: The average scores of male and female
✅Also known as independent samples
students in this statistics examination are
equal, that is, 𝝁𝟏 = 𝝁𝟐 .
✅This t-Test procedure is used to compare
t-Test There is no significant difference (if equal)
Ha : The average scores of male and
the mean of two groups, which are female students in this statistics
independent of one another. examination are not equal, that is, 𝝁𝟏 ≠ 𝝁𝟐 .
There is significant difference (if not equal)
Test Statistic Pooled Variance
STEP 2: Level of Significance
(𝑥1−𝑥2) − µ𝑑 2 (𝑛1−1)𝑆12 +(𝑛2−1)𝑆12 α = 0. 05
𝑆𝑝 =
𝑡= 𝑛1+𝑛2−2
2 1 1
𝑆𝑝 ( 𝑛1
+ 𝑛2
)
Example 5.8 (Page 203)
1.) A random sample of scores obtained in a STEP 3: Pooled variance
statistics examination by eight male 2 (𝑛1−1)𝑆12 +(𝑛2−1)𝑆22 2
(8−1)9.6 +(6−1)14
2
students and a random sample of scores 𝑆𝑝 = 𝑛1+𝑛2− 2
= (8+6)−2
obtained in the same examination by six 2
𝑆𝑝 = 135. 4
female students are shown below:
(𝑥1−𝑥2) (8−6)
𝑡= = 1 1
2 1 1
𝑆𝑝 ( 𝑛 + 𝑛 ) 135.4( 8 + 6 )
Test whether the average scores of the 1 2
𝑡 = 1. 81
male and female students in this statistics
examination are equal at the 5% level of STEP 4: Find the critical values
significance level. Assume that the statistics
examination scores of male and female 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
students are normally distributed with = 8 + 6 − 2 = 12
unknown but equal variance. 0.05
α = 2 = 0. 025
(𝑠𝑒𝑒 𝑎𝑝𝑝𝑒𝑛𝑑𝑖𝑥 𝐶)
= ± 2. 179
STEP 5 : Make a conclusion
Since the computed t-test statistic does not fall in the
rejection region, do not reject H0 . Therefore, at 𝜶 = 𝟓% there
is no significant difference between the average scores of
male and female students in the statistics examination.
STATPROB 4TH (FINALS)
2.) A researcher wants to know if there is a
significant difference in the marital age of STEP 4: Find the critical values
males and females. An interview with 27
males and 25 females showed that on the 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
average, the mean marital age is 28.5 for = 27 + 2 − 2 = 27
males and 26.1 for females with standard 0.05
deviations of 1.54 and 0.89, respectively. α = 2 = 0. 025
(𝑠𝑒𝑒 𝑎𝑝𝑝𝑒𝑛𝑑𝑖𝑥 𝐶)
At 0.05 level of significance, is there a = ± 2. 0518
significant difference in the marital ages of
males and females? Assume that the STEP 5 : Make a conclusion
marital age of male and female students are Since the computed t-test statistic fall in the rejection region,
reject H0 . Therefore, at 𝜶 = 𝟓% there is a significant
normally distributed with unknown but equal difference between the males and females in terms of their
variance. marital age.
STEP 1: Construct the null hypothesis
2.1 Paired Sample T-test
Ho: The average The average marital ages
of male and female are equal, that is, 𝝁𝟏 =
✅“dependent samples t-test”, “t-test for
repeated measures”, “t-test for matched
𝝁𝟐 .
✅
samples”
There is no significant difference (if equal)
It is a statistical procedure in which each
Ha : TThe average marital ages of male and
subject or entity is measured twice, resulting
female are not equal, that is, 𝝁𝟏 ≠ 𝝁𝟐 .
in pairs of observation.
There is significant difference (if not equal)
▪ before – after ▪ pretest – post test
STEP 2: Level of Significance
TRY: Paired Samples t-Test or not?
α = 0. 05
1. Learning styles: A study is conducted to
determine whether students perform better
when they are taught using their preferred
learning style. The students' test scores are
recorded before and after the intervention.
Paired Samples T-test
2. Medical Research: A medical researcher
wants to determine whether the mean
cholesterol level of a sample of patients is
significantly different from a known value.
STEP 3: Pooled variance The cholesterol levels of a sample of
(𝑛1−1)𝑆12 +(𝑛2−1)𝑆22 2 2
patients are collected.
2 (27−1)1.54 +(25−1)0.89
𝑆𝑝 = = One-sample z-test
𝑛1+𝑛2− 2 (27+25)−2
2
𝑆𝑝 = 1. 61S 3. Classical Music: To test the effect of
classical music on academic performance,
(𝑥1−𝑥2) senior high school students are observed.
(27−25) For one month, each subject studies without
𝑡= =
2 1
𝑆𝑝 ( 𝑛 + 𝑛 )
1 1
1.61( 27 + 25 )
1 music. For another month, the subject
1 2 studies while listening to classical music.
𝑡 = 1. 81 Paired-sample t-Test
STATPROB 4TH (FINALS)
Formulas: STEP 3: Critical Values
α 0.05 𝑑𝑓 = (𝑛 − 1)
=
Test Statistic Where 𝑆𝑑: 2 2
2 = (8 − 1)
𝑆𝑝 = 0. 025
𝑑𝑓 = 7
𝑑 2
𝑡= Σ(𝑑−𝑑)
𝑆𝑑
𝑆𝑑 = 𝑛−1
𝑛
STEP 4: Test Statistic
1.) A researcher wants to test whether a
new teaching method improves the test
scores of students. The researcher
administers a pre-test and a posttest to the
same group of students. The following table
summarizes the scores of the 8 students in
both tests.
STEP 5 : Make a conclusion
Since the test statistic (t = 3.0025) falls on the rejection
region, we REJECT Ho.
Test if there is significant difference in the
scores of the students before and after the
implementation of the new teaching method.
Use 0.05 level of significance.
Solution:
STEP 1: Construct the null hypothesis Therefore, we can say that at 5% level of
significance, there is enough evidence to
reject the claim that there is no significant
Ho: There is no significant difference in the
difference in the score of the students
score of the students before and after the before and after the administration of the
implementation of the new teaching method. teaching strategy.
Ha : There is a significant difference in the or
score of the students before and after the Therefore, we can say that at 5% level of
implementation of the new teaching method significance, there is enough evidence to
the claim that there is a significant
difference in the score of the students
STEP 2: Level of Significance
before and after the administration of the
α = 0. 05 teaching strategy
STATPROB 4TH (FINALS)
2.) A new treatment for depression was
developed and the researcher wants to test
Test Statistic Solution:
whether this is effective. The researcher
recruits 5 participants who are diagnosed 𝑑 𝑑
with depression and measures their 𝑡= 𝑆𝑑 𝑡= 𝑆𝑑
depression symptoms using a standardized 𝑛 𝑛
5.2
scale before and after the treatment. The 𝑡= 2.59
data is shown below. Test the hypothesis 5
that the mean depression symptoms before 𝑡 = 4. 4894
and after the treatment is the same at a
significance level of 0.01. STEP 5 : Make a conclusion
Since the test statistic (t = 4.4894) falls on the acceptance
region, we DO NOT REJECT Ho.
STEP 1: Construct the null hypothesis
Solution:
Ho: There is no significant difference in the Therefore, we can say that at 1% level of
significance, there is enough evidence to
depression symptoms of participants before not reject the claim that there is no
and after the treatment. significant difference in the in the
Ha : There is a significant difference in the depression symptoms of participants before
depression symptoms of participants before and after the treatment
and after the treatment.
3.) Suppose that a study was conducted to
determine the effect of a vegetarian diet on
STEP 2: Level of Significance
weight. The data below are the weights (in
α = 0. 01 kilograms) of 7 persons who were on a
STEP 3: Critical Values standard diet and agreed to adopt a
α 0.01 vegetarian diet for one month. Weights were
= 𝑑𝑓 = (𝑛 − 1)
2 2 recorded before they adopted the diet and
2 = (5 − 1) one month after.
𝑆𝑝 = 0. 005
𝑑𝑓 = 4
Is there a significant difference in the weight
of the participants before and after they
adopted the vegetarian diet? Use 1% level
STEP 4: Test Statistic of significance.
STATPROB 4TH (FINALS)
STEP 1: Construct the null hypothesis STEP 5 : Make a conclusion
Since the test statistic (t = 3.4806) falls on the acceptance
region, we DO NOT REJECT Ho
Ho: There is no significant difference in the
weight of the participants before and after
they adopted the vegetarian diet.
Ha : There is a significant difference in the
weight of the participants before and after
they adopted the vegetarian diet.
Solution:
STEP 2: Level of Significance Therefore, we can say that at 1% level of
α = 0. 01 significance, there is enough evidence not
STEP 3: Critical Values to reject the claim that there is no significant
difference in the weight of the participants b
α 0.01 𝑑𝑓 = (𝑛 − 1)
2
= 2
2 = (7 − 1) 3.1 Linear Correlation
𝑆𝑝 = 0. 005
𝑑𝑓 = 6
✅It is used to determine whether a
relationship between two continuous
✅
variables exist.
It measures the strength (qualitatively)
and direction of the linear association
between two variables.
3.2 Scatter Diagram
STEP 4: Test Statistic ✅
✅ also called scatter plot
is a plot of pairs of values of two
variables in a rectangular coordinate plane
displaying a relationship between the two
variables
Test Statistic Solution: 3.3 Correlation Coefficient
𝑡=
𝑑
𝑡=
𝑑 ✅ is a measure of the strength of the linear
✅
association between two variables
𝑆𝑑 𝑆𝑑
𝑛 𝑛
2.71 its value ranges from -1(perfect negative
𝑡= 2.06 correlation) to +1(perfect positive
7
correlation)
𝑡 = 3. 4806
STATPROB 4TH (FINALS)
3.4 Pearson product-moment correlation
coefficient
Formula:
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑟= 2 2 2 2
[𝑛(Σ𝑥 )−(Σ𝑥) ][𝑛(Σ𝑦 )−(Σ𝑦) ]
B) Compute the correlation coefficient
Qualitative Interpretation of Correlation
Coefficients
Absolute value of Strength of linear
correlation relationship between X
coefficient and Y
0 - 0.20 Very Weak Correlation
0.21 - 0.40 Weak Correlation
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑟=
0.41 - 0.60 Moderate Correlation 2 2 2
[𝑛(Σ𝑥 )−(Σ𝑥) ][𝑛(Σ𝑦 )−(Σ𝑦) ]
2
9(Σ208,350)−(441)(460)
0.61 - 0.80 Strong Correlation 𝑟= 2 2
[9(25395)−(441) ][9(2420000)−(4600) ]
0.81 - 1.0 Very Strong Correlation 𝑟= − 0. 994 (correlation coefficient)
3.4 Coefficient of Determination C) Compute the Coefficient of
✅
✅ is denoted by 𝑟
2
is a measure of predicting power of the
Determination
✅
linear model.
it is a value between 0 and 1 that gives
the proportion of total variability in the
𝑟=
2
− 0. 994
𝑟 = (− 0. 994)
2
This means that
98.80% of the total
variability in the
dependent variable y that could be 𝑟 = 0. 9880 𝑥 100 prices of used vans
explained or accounted for by the linear 𝑟 = 98. 80% of the same model
relationship with the independent variable x. can be accounted
1.) Suppose that data on a random sample for by the linear
of used vans of the same brand and model relationship with the
on mileage (in kilometers) and prices (in distances (mileage)
pesos) were collected and tabulated, as traveled by these
shown on the next page. Construct a scatter vans.
plot of mileage and price. What trend can
you observe?
A) Construct a scatter plot of mileage and
price
STATPROB 4TH (FINALS)
explore relationships between variables,
such as the correlation between stress
2.) The table shows the time in hours spent and health, or the relationship between
by five students in playing computer games educational attainment and income.
and scores of these students got on a Math
test. Solve for the Pearson’s r and describe
the result.
4.1 Regression Analysis
Can these real-life situations be
✅
modeled mathematically? Justify.
There is a very high positive
correlation (with a coefficient of 0.90)
between average study hours per week
and Statistics and Probability grade.
The table shows the time in hours spent by
five students in playing computer games - Is it possible to predict your
and scores of these students got on a Math
Statistics and Probability grade
test. Solve for the Pearson’s r and describe
the result
given your average study hours
per week?
- Given the weight of a pregnant
mother, can you predict the
weight of her infant? Can you
forecast the number of deaths
caused by lung cancer, if you
have data on cigarette
consumption?
Regression Analysis is a..
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) - It is a statistical method that
𝑟= 2 2 2 2 determines the nature of the
[𝑛(Σ𝑥 )−(Σ𝑥) ][𝑛(Σ𝑦 )−(Σ𝑦) ]
5(175)−(15)(75) relationship between variables,
𝑟= 2 2 that is either positive or negative,
[5(55)−(15) ][5(1375)−(75) ]
5(175)−(15)(75) linear or non-linear.
𝑟= - It gives the regression equation
[50][1250]
𝑟 =− 1 that enables us to predict the
value of the dependent variable
Interdisciplinary Link: given the value of the
● Education: In educational research, independent variable
correlation can be used to study the
relationship between teaching methods Independent Dependent
and student performance. Variable Variable
● Marketing and Business: Marketers
often use correlation to assess the is a standalone - is a variable that
relationship between advertising variable, which changes as a
spending and sales figures. means that its result of the
● Psychology and Social Sciences: In value can change change in the
psychology, correlation is used to without reference independent
STATPROB 4TH (FINALS)
to another variable. kilograms) of a random sample of five
variable students in high school.
Example: A teacher studies how attitude a.) Construct a scatter plot and
affects the math performance of senior determine the regression line that best
high school students fits the data
4.2 Regression Line
✅
✅ also called the line of best fit
is the line drawn through a scatter plot
which can be used to find the direction of
✅
the association between the two variables
it divides the points on the scatter plot
such that the number of points above is
approximately equal to the number of
points below
b.) Compute the coefficients a and b
then determine the equation of the
regression line.
Formulas
𝑦' − 𝑎 + 𝑏𝑥
2
(Σ𝑦)(Σ𝑥 )−(Σ𝑥)(Σ𝑥𝑦)
𝑎= 2 2 - y-intercept
𝑛(Σ𝑥 )−(Σ𝑥)
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏= 2 2 𝑠𝑙𝑜𝑝𝑒
𝑛(Σ𝑥 )−(Σ𝑥)
1.) Consider the following data on the
height (in inches) and weight (in