STAT200 Week6 Homework Solutions
STAT200 Week6 Homework Solutions
9.1.2
a.) State the random variables and the parameters in words.
x1 = number of female students taking the biology exam
x2 = number of female students taking the calculus AB exam
p1 = proportion of female students taking the biology exam
p2 = proportion of female students taking the calculus AB exam
b.) State and check the assumptions for confidence interval:
(i) A sample of 144796 students taking the biology exam is taken. A sample of
211,693 students taking the calculus AB exam is taken. Both samples were
collected from all students for each exam during a particular year. This isn’t
really a sample unless the year that was chosen at random, so this assumption
may not have been met.
(ii) The samples are independent since different tests.
(iii) The assumptions for the binomial distribution are satisfied in both
populations, since there are only two responses, there are a fixed number of
trials, the probability of a success is the same, and the trials are independent.
(iv) x1 = 84199 , n1 - x1 = 144796 - 84199 = 60597 , x2 = 102598 , and
n2 - x2 = 211693 -102598 = 109095 are all greater than or equal to 5. So
both sampling distributions of p̂1 and p̂2 can be approximated with a normal
distribution.
Confidence Interval:
zC = 1.645 [Using Excel: Enter “=NORM.S.INV(0.95)”]
9.1.5
a.) State the random variables and the parameters in words.
x1 = number of children diagnosed with Autism Spectrum Disorder (ASD) in
Pennsylvania
x2 = number of children diagnosed with ASD in Utah
p1 = proportion of children diagnosed with ASD in Pennsylvania
p2 = proportion of children diagnosed with ASD in Utah
b.) State the null and alternative hypotheses and the level of significance
H o : p1 = p2 or H o : p1 - p2 = 0
H A : p1 > p2 H A : p1 - p2 > 0
a = 0.01
c.) State and check the assumptions for a hypothesis test
i. A sample of diagnosis of 18,440 eight-year-olds in Pennsylvania is taken. A sample
of diagnosis of 2123 eight-year-olds in Utah is taken. Both samples were taken for
the same year. So unless the year that was chosen was random, this assumption may
not have been met.
ii. The samples are independent since different states.
iii. The assumptions for the binomial distribution are satisfied in both populations, since
there are only two responses, there are a fixed number of trials, the probability of a
success is the same, and the trials are independent.
iv. x1 = 245 , n1 - x1 = 18440 - 245 = 18195 , x2 = 45 , and
n2 - x2 = 2123 - 45 = 2078 are all greater than or equal to 5. So both sampling
distributions of p̂1 and p̂2 can be approximated with a normal distribution.
d.) Find the sample statistics, test statistic, and p-value
Sample Proportion:
n1 = 18440 n2 = 2123
245 45
p̂1 = » 0.0133 p̂2 = » 0.0212
18440 2123
245 18195 45 2078
q̂1 = 1- = » 0.9867 q̂2 = 1- = » 0.9788
18440 18440 2123 2123
Pooled Sample Proportion, p :
245 + 45 290
p= = » 0.0141
18440 + 2123 20563
290 20273
q = 1- = » 0.9859
20563 20563
Test Statistic:
z=
( 0.0133 - 0.0212 ) - 0
0.0141* 0.9859 0.0141* 0.9859
+
18440 2123
» -2.924
p-value:
Using TI-83/84: normalcdf ( -2.924,1E99,0,1) » 0.998
Using Excel: This is a right_tailed test, so we need to find the area to the right of -
2.924. Enter “=1-NORM.S.DIST(-2.924,1)”, and we will get 0.998 as the p-value.
e.) Conclusion
Fail to reject H o , since the p-value is greater than 1%.
f.) Interpretation
This is not enough evidence to show that the proportion of children diagnosed with ASD
in Pennsylvania is more than the proportion of children diagnosed with ASD in Utah.
9.2.3
a.) State the random variables and the parameters in words.
x1 = wholesale prices from east coast fishery
x2 = wholesale prices from west coast fishery
m1 = mean wholesale prices from east coast fishery
m2 = mean wholesale prices from west coast fishery
b.) State the null and alternative hypotheses and the level of significance
Let μd = μ1 – μ2
H0 : μd = 0
HA : μd < 0
α= 0.05
c.) State and check the assumptions for the hypothesis test
i. A sample of 9 pairs of wholesale prices from both companies was taken. The
problem did not state whether the sample was random. So it may not be random.
ii. The population of the difference in wholesale prices between east coast and west
coast companies is normally distributed. The histogram looks somewhat bell shaped.
There are no outliers in the difference data set. The probability plot on the differences
looks somewhat linear. So you can assume that the distribution of the difference in
wholesale prices is normally distributed.
Difference between Prices
Histogram of diff
3.0
2.5
2.0
Frequency
1.5
1.0
0.5
0.0
-10 -5 0 5 10 15 -5 0 5 10 15
diff Difference
5
0
-5
Theoretical Quantiles
9.2.6
a.) State the random variables and the parameters in words.
x1 = traffic count on Friday the 6th
x2 = traffic count on Friday the 13th
m1 = mean traffic count on Friday the 6th
m2 = mean traffic count on Friday the 13th
b.) State and check the assumptions for the confidence interval
i. A random sample of 10 pairs of traffic counts on Friday the 6th and the 13th was
taken. This was not stated, so this assumption my not be true.
ii. The population of the difference in traffic counts between Friday the 6th and the
13th is normally distributed. The histogram looks somewhat bell shaped. There are
no outliers in the difference data set. The probability plot on the differences looks
somewhat linear. So you can assume that the distribution of the difference in traffic
counts is normally distributed.
c.) Find the sample statistic and confidence interval
Sample Statistics:
Dates 6th 13th d = x1 - x2
1990, July 139246 138548 698
1990, July 134012 132908 1104
1991, September 137055 136018 1037
1991, September 133732 131843 1889
1991, December 123552 121641 1911
1991, December 121139 118723 2416
1992, March 128293 125532 2761
1992, March 124631 120249 4382
1992, November 124609 122770 1839
1992, November 117584 117263 321
d » 1835.8
sd » 1176.01
t c = 1.833 [Using Excel: Enter “=T.INV(0.95,9)”]
s 1176.01
E = t c d = 1.833 » 681.7
n 10
d - E < md < d + E
1835.8 - 681.7 < md < 1835.8 + 681.7
1154.1 < md < 2517.5
d.) Statistical Interpretation: There is a 90% chance that 1154.09 < md < 2517.51 contains
the true mean difference in traffic counts.
e.) Real World Interpretation: The mean difference in traffic counts between Friday the
6th and Friday the 13th is between 1154.1 and 2517.5.
9.3.1
a.) State the random variables and the parameters in words.
x1 = income of a male in 2013
x2 = income of a female in 2013
m1 = mean income of a male in 2013
m2 = mean income of a female in 2013
b.) State the null and alternative hypotheses and the level of significance
The hypotheses would be
H o : m1 = m2 or H o : m1 - m2 = 0
H A : m1 > m2 H A : m1 - m2 > 0
a = 0.01
c.) State and check the assumptions for the hypothesis test
i. A random sample of 52 income levels for males in 2013 is taken. A random sample
of 52 income levels for females in 2013 is taken. The problem does not state if either
sample was randomly selected. So this assumption may not be valid.
ii. The two samples are independent since these are different genders.
iii. Population of all income levels for males is normally distributed. The sample size is
30 or more. Population of all income levels for females is normally distributed. The
sample size is 30 or more.
d.) Find the sample statistic, test statistic, and p-value
Sample Statistic:
x1 = 46453.3, x2 = 36511, s1 » 7030.71, s2 » 6138.12, n1 = 52, n2 = 52
Test Statistic:
( x - x ) - ( m1 - m2 )
t= 1 22
s1 s2 2
+
n1 n2
=
( 46453.3 - 36511) - 0
7030.712 6138.12 2
+
52 52
» 7.682
s12 7030.712
A= = » 950593.9058
n1 52
s2 2 6138.12 2
B= = » 724548.4064
n2 52
df =
( A + B )2
A2 B2
+
n1 - 1 n2 - 1
=
( 950593.9058 + 724548.4064 )
2
2
950593.9058 724548.4064 2
+
52 - 1 52 - 1
» 100.176
p-value:
Using TI-83/84: p-value = tcdf ( 7.682,1E99,100.176 ) » 5.41´10 -12
Using Excel: This is a right-tailed test, we need to find the area to the right of
7.682. Enter “=T.DIST.RT(7.682,100)” and we get p-value as 5.45E-12.
e.) Conclusion
Reject H o since the p-value < 0.01.
f.) Interpretation
This is enough evidence to show that the mean income of males is more than of females.
9.3.3
a.) State the random variables and the parameters in words.
x1 = total brain volume (TBV) of patients that are considered normal
x2 = TBV of patients that had schizophrenia
m1 = mean TBV of patients that are considered normal
m2 = mean TBV of patients that had schizophrenia
b.) State the null and alternative hypotheses and the level of significance
The normal hypotheses would be
H o : m1 = m2 or H o : m1 - m2 = 0
H A : m1 > m2 H A : m1 - m2 > 0
a = 0.10
c.) State and check the assumptions for the hypothesis test
i. A random sample of 32 TBV of patients that are considered normal is taken. A
random sample of 31 TBV of patients that had schizophrenia is taken. The problem
does not state if either sample was randomly selected, but it was a study that was
completed. So this is safe to assume.
ii. The two samples are independent since these are people with different brain
chemistry.
iii. Population of TBV of patients that are considered normal is normally distributed. The
histogram looks somewhat bell shaped. There are no outliers. The normal probability
plot appears to be linear. So this assumption is probably true.
Population of TBV of patients that had schizophrenia is normally distributed. The
histogram looks somewhat bell shaped. There are no outliers. The normal probability
plot appears to be linear. So this assumption is probably true.
Brain Volume Normal Brain
Histogram of bvnormal
10
8
6
Frequency
4
2
0
1400000
1300000
1200000
-2 -1 0 1 2
Theoretical Quantiles
Histogram of bvschizophrenic
Brain Volume Schizophrenic Brain
8
6
Frequency
4
2
0
1800000
1600000
Sample Quantiles
1400000
1200000
-2 -1 0 1 2
Theoretical Quantiles
Test Statistic:
t=
( x1 - x2 ) - ( m1 - m2 )
s12 s2 2
+
n1 n2
s12 125458 2
A= = » 491865930.1
n1 32
s22 171932 2
B= = » 953568149.2
n2 31
df =
( A + B )2
A2 B2
+
n1 - 1 n2 - 1
s12 s2
where A = and B = 2
n1 n2
Therefore,
56.3
p-value:
Using Excel: This is a right-tailed test, we need to find the area to the right of
0.3168. Enter “=T.DIST.RT(0.3168,56.3)”, and we get the p-value as 0.3763.
e.) Conclusion
Fail to reject H o since the p-value > 0.10.
f.) Interpretation
This is not enough evidence to show the patients with schizophrenia have less TBV on
average than a patient that is considered normal.
9.3.4
a.) State the random variables and the parameters in words.
x1 = total brain volume (TBV) of patients that are considered normal
x2 = TBV of patients that had schizophrenia
m1 = mean TBV of patients that are considered normal
m2 = mean TBV of patients that had schizophrenia
b.) State and check the assumptions for the hypothesis test
The assumptions were stated and checked in problem # 9.3.3.
c.) Find the sample statistic and confidence interval
Sample Statistic:
Confidence Interval:
The confidence interval estimate of the difference m1 - m2 is
df = 56.3 ≈ 56
tc = 1.673 [Using Excel: Enter “=T.INV.2T(0.1,56)”]
9.3.8
a.) State the random variables and the parameters in words.
x1 = number of cell phones per 100 residents in countries of Europe
x2 = number of cell phones per 100 residents in countries of the Americas
m1 = mean number of cell phones per 100 residents in countries of Europe
m2 = mean number of cell phones per 100 residents in countries of the Americas
b.) State and check the assumptions for the hypothesis test
i. A random sample of number of cell phones per 100 residents in 53 countries of
Europe is taken. A random sample of number of cell phones per 100 residents in 39
countries of the Americas is taken. The problem does not state if either sample was
randomly selected. So this assumption may not be valid.
ii. The two samples are independent since these are different parts of the world.
iii. Population of number of cell phones in all countries of Europe is normally
distributed. The sample size is 30 or more. Population of number of cell phones in all
countries of the Americas is normally distributed. The sample size is 30 or more.
c.) Find the sample statistic and confidence interval
Sample Statistic:
x1 = 108.151, x2 = 87.2051, s1 » 29.965, s2 » 35.1554, n1 = 53, n2 = 39
s12 29.965 2
A= = » 16.941533
n1 53
s22 35.1554 2
B= = » 31.68898
n2 39
df =
( A + B )2 =
(16.941533 + 31.68898 )2
» 74.0298
A2 B2 16.9415332 31.68898 2
+ +
n1 -1 n2 -1 53 -1 39 -1
Confidence Interval:
The confidence interval estimate of the difference m1 - m2 is
df = 74.0298 » 74
t c » 2.378
[Using Excel: Enter “=T.INV.2T(0.02,74)”]
s12 s2 2
E = tc +
n1 n2
29.965 2 35.1554 2
= 2.378 +
53 39
» 16.583
( x1 - x2 ) - E < m1 - m2 < ( x1 - x2 ) + E
(108.151 - 87.2051) -16.583 < m1 - m2 < (108.151 - 87.2051) + 16.583
4.3629 < m1 - m2 < 37.5289
d.) Statistical Interpretation: There is a 98% chance that 4.3641 < m1 - m2 < 37.5276
contains the true difference in means.
e.) Real World Interpretation: The mean number of cell phones per 100 residents in
countries of Europe is anywhere from 4.3641 to 37.5276 more than the mean number
of cell phones per 100 residents in countries of the Americas.
11.3.2
a.) State the random variables and the parameters in words
x1 = percentage difference of waste between the layout on the computer and the actual
waste when the clothing is made (called run-up) from plant 1
x2 = percentage difference of waste between the layout on the computer and the actual
waste of run-up from plant 2
x 3 = percentage difference of waste between the layout on the computer and the actual
waste of run-up from plant 3
x4 = percentage difference of waste between the layout on the computer and the actual
waste of run-up from plant 4
x5 = percentage difference of waste between the layout on the computer and the actual
waste of run-up from plant 5
m1 = mean percentage difference of waste between the layout on the computer and the
actual waste of run-up from plant 1
m2 = mean percentage difference of waste between the layout on the computer and the
actual waste of run-up from plant 2
m 3 = mean percentage difference of waste between the layout on the computer and the
actual waste of run-up from plant 3
m 4 = mean percentage difference of waste between the layout on the computer and the
actual waste of run-up from plant 4
m5 = mean percentage difference of waste between the layout on the computer and the
actual waste of run-up from plant 5
b.) State the null and alternative hypotheses and the level of significance
H o : m1 = m2 = m3 = m4 = m5
H A :at least two of the means are not equal
a = 0.01
c.) State and check the assumptions for the hypothesis test
i. A random sample of 22 percentage differences of waste between the layout on the
computer and the actual waste of run-up from plant 1 was taken. A random sample of
22 percentage differences of waste between the layout on the computer and the actual
waste of run-up from plant 2 was taken. A random sample of 19 percentage
differences of waste between the layout on the computer and the actual waste of run-
up from plant 3 was taken. A random sample of 19 percentage differences of waste
between the layout on the computer and the actual waste of run-up from plant 4 was
taken. A random sample of 13 percentage differences of waste between the layout on
the computer and the actual waste of run-up from plant 5 was taken. These statements
may not be true. This information was not shared as to whether the samples were
random or not but it may be safe to assume that.
ii. Since the jeans are made in different plants, then the samples are independent.
iii. Population of all percentage differences of waste between the layout on the computer
and the actual waste of run-up from plant 1 is normally distributed. Population of all
percentage differences of waste between the layout on the computer and the actual
waste of run-up from plant 2 is normally distributed. Population of all percentage
differences of waste between the layout on the computer and the actual waste of run-
up from plant 3 is normally distributed. Population of all percentage differences of
waste between the layout on the computer and the actual waste of run-up from plant 4
is normally distributed. Population of all percentage differences of waste between the
layout on the computer and the actual waste of run-up from plant 5 is normally
distributed. Looking at the histograms and normal probability plots for each sample, it
appears that some of the populations are approximately normally distributed and
some are not. So this assumption may not be met.
iv. The population variances are all equal. The sample standard deviations are
approximately 10.03, 15.35, 4.40, 3.66, and 9.56 respectively. This assumption does
not appear to be met, since the sample standard deviations are very different.
d.) Find the test statistic and p-value
We need to leverage technology for this part. There are two applets;
(1) One is based on summary data (mean and standard deviations for each group). The
instruction is located in the LEO classroom as well as in
http://statpages.info/anova1sm.html
(2) The other is based on the raw data, which will calculate the sample mean, and sample
standard deviation with the ANOVA result. The instruction is in the LEO classroom
as well as in:
http://vassarstats.net/anova1u.html
11.3.4
a.) State the random variables and the parameters in words
x1 = percent difference of calories between measured and labeled reduced calorie food
that is nationally advertised
x2 = percent difference of calories between measured and labeled reduced calorie food
that is regionally distributed
x 3 = percent difference of calories between measured and labeled reduced calorie food
that is prepared locally
m1 = mean percent difference of calories between measured and labeled reduced calorie
foods that are nationally advertised
m2 = mean percent difference of calories between measured and labeled reduced calorie
foods that are regionally distributed
m 3 = mean percent difference of calories between measured and labeled reduced calorie
foods that are prepared locally
b.) State the null and alternative hypotheses and the level of significance
H o : m1 = m2 = m3
H A :at least two of the means are not equal
a = 0.10
c.) State and check the assumptions for the hypothesis test
i. A random sample of 20 percent differences of calories between measured and labeled
reduced calorie foods that is nationally advertised was taken. A random sample of 12
percent differences of calories between measured and labeled reduced calorie foods
that is regionally distributed was taken. A random sample of 8 percent differences of
calories between measured and labeled reduced calorie foods that is prepared locally
was taken. These statements may not be true. This information was not shared as to
whether the samples were random or not but it may be safe to assume that.
ii. Since the foods are prepared in different locations, then the samples are independent.
iii. Population of all percent differences of calories between measured and labeled
reduced calorie foods that is nationally advertised is normally distributed. Population
of all percent differences of calories between measured and labeled reduced calorie
foods that is regionally distributed is normally distributed. Population of all percent
differences of calories between measured and labeled reduced calorie foods that is
prepared locally is normally distributed. Looking at the histograms and normal
probability plots for each sample, it appears that most of the populations are not
normally distributed.
iv. The population variances are all equal. The sample standard deviations are
approximately 10.52, 16.07, and 83.97 respectively. This assumption appears to not
be met, since the sample standard deviations are very different.
v. Find the test statistic and p-value
We need to leverage technology for this part. There are two applets;
(1) One is based on summary data (mean and standard deviations for each group). The
instruction is located in the LEO classroom as well as in
http://statpages.info/anova1sm.html
(2) The other is based on the raw data, which will calculate the sample mean, and sample
standard deviation with the ANOVA result. The instruction is in the LEO classroom
as well as in:
http://vassarstats.net/anova1u.html
The test statistic using technology is F »12.979 and p - value < 0.0001 .
vi. Conclusion
Reject H o since the p-value is less than 0.10.
vii. Interpretation
There is evidence to indicate that at least two of the mean percent differences between the
three groups are different.