STAT200 Week7 Homework Solutions
STAT200 Week7 Homework Solutions
10.1.2
x = house value
y = rental income
Rental Income vs House Value
12000
10000
Rental Income
8000
6000
House Value
In Excel, enter the house value in a column as x-values, and rental income in a column as
y-values.
Using =INTERCEPT(y-values, x-values) to get intercept, and =SLOPE(y-values, x-values)
to get the slope.
Or Using applet: The following applet can get the regression equation and correlation
coefficient for us: https://www.mathportal.org/calculators/statistics-calculator/correlation-and-
regression-calculator.php.
ŷ = 0.0244 x + 5364
x = $230,000 ŷ = 0.0244 ( 230000 ) + 5364 = $10976
x = 400,000 ŷ = 0.0244 ( 400000 ) + 5364 = $15,124
The y value when found using the x = $230,000 is closer to the true value since it is
interpolation while the other y value would be extrapolation.
10.1.4
x = health expenditure (% of GDP)
y = prenatal care (%)
Prenatal Care vs Health Expenditure
100
90
80
Prenatal Care (%)
70
60
50
4 5 6 7 8 9 10
In Excel, enter the percentage GDP spent on health expenditure in a column as x-values,
and the percentage of women receiving prenatal care as y-values.
Using =INTERCEPT(y-values, x-values) to get intercept, and =SLOPE(y-values, x-values)
to get the slope.
Or Using applet: The following applet can get the regression equation and correlation
coefficient for us: https://www.mathportal.org/calculators/statistics-calculator/correlation-and-
regression-calculator.php.
ŷ = 1.661x + 69.739
x = 5.0 ŷ = 1.661( 5.0 ) + 69.739 = 78.044%
x = 12.0 ŷ = 1.661(12.0 ) + 69.739 = 89.67%
The prenatal care percent for 5.0% health expenditure is probably closer to the true value
than the one for 12.0 because it is interpolation, while the one for 12.0 is extrapolation.
10.2.2
x = house value
y = rental income:
In Excel, use =CORREL(x-values, y-values) to get the correlation coefficient.
Or Using applet: The following applet can get the regression equation and correlation coefficient
for us: https://www.mathportal.org/calculators/statistics-calculator/correlation-and-regression-
calculator.php.
correlation coefficient: 0.7647, which is a moderate, positive correlation
coefficient of determination is the square of correlation coefficient: 0.76472 = 0.585; 58.5% of
the variability in the rental income is accounted for by the value of the house
10.2.4
x = health expenditure (% of GDP)
y = prenatal care (%)
In Excel, use =CORREL(x-values, y-values) to get the correlation coefficient.
Or Using applet: The following applet can get the regression equation and correlation coefficient
for us: https://www.mathportal.org/calculators/statistics-calculator/correlation-and-regression-
calculator.php.
Correlation coefficient: 0.1715, which is a weak, positive correlation
Coefficient of determination is the square of the correlation coefficient: 0.17152 = 0.0294;
2.94% of the variability in the percent of prenatal care is accounted for by the health expenditure
as a percent of GDP
10.3.2
x = house value
y = rental income
a.)
1. State the null and alternative hypotheses and the level of significance
Ho : r = 0
HA : r > 0
a = 0.05
2. State and check the assumptions for the hypothesis test
a. A random sample of value of the house and the amount of rental income was taken.
This is not stated, so it is not sure if this is true.
b. The distribution for each rental income is normally distributed for every value of the
house.
i. Look at the scatter plot of rental income versus value of the house. It looks fairly
linear.
ii. There are no points that appear to be outliers.
iii. The residual plot for rental income versus value of the house appears to be fairly
random.
It appears that there is a normal distribution.
Residuals versus House Value
4000
3000
2000
1000
residuals
0
-1000
-2000
House Value
4. Conclusion
Reject H o since the p-value is less than 0.05.
5. Interpretation
There is enough evidence to show that there is a positive correlation between house value
and rental amount.
10.3.4
x = health expenditure (% of GDP)
y = prenatal care (%)
a.)
1. State the null and alternative hypotheses and the level of significance
Ho : r = 0
HA : r > 0
a = 0.05
2. State and check the assumptions for the hypothesis test
a. A random sample of percentage spent on health expenditure and the percentage of
woman receiving prenatal care was taken. This is not stated, so it is not sure if this is
true.
b. The distribution for each percentage of women receiving prenatal care value is
normally distributed for percentage spent on health expenditure.
i. Look at the scatter plot of percentage spent on health expenditure versus the
percentage of woman receiving prenatal care of a person. It looks somewhat
linear.
ii. There was one point that appears to be outlier. It is removed from the rest of the
analysis.
iii. The residual plot for percentage spent on health expenditure versus the percentage
of woman receiving prenatal care appears to be fairly random.
It appears that there is a normal distribution.
Residuals vs Health Expenditure
20
10
0
Residuals
-10
-20
-30
4 5 6 7 8 9 10
p-value:
Enter “=T.DIST.2T(0.6277,13)” in Excel, and we get the p-value as 0.5411
4. Conclusion
Fail to reject H o since the p-value is greater than 0.05.
5. Interpretation
There is not enough evidence to show that there is a correlation between percentage spent
on health expenditure and the percentage of women receiving prenatal care.
11.1.2
a.) State the null and alternative hypotheses and the level of significance
H o : the activity and the time period are independent for dolphins
H A : the activity and the time period are dependent for dolphins
a = 0.01
b.) State and check the assumptions for the hypothesis test
i. A random sample of the activity and the time period for dolphins was taken.
ii. Expected frequencies for each cell are greater than or equal to 5. Not all expected
frequencies are more than 5. There are three that are below 5, so this assumption is
not true and the results of the hypothesis test may not be valid.
c.) Find the test statistic and p-value
Test statistic:
O E O-E ( O - E )2 ( O - E )2 E
6 14.9 -8.9 79.21 5.316107383
28 33.5 -5.5 30.25 0.902985075
38 23.6 14.4 207.36 8.786440678
6 3.1 2.9 8.41 2.712903226
4 7.0 -3 9 1.285714286
5 4.9 0.1 0.01 0.002040816
14 4.7 9.3 86.49 18.40212766
0 10.7 -10.7 114.49 10.7
9 7.5 1.5 2.25 0.3
13 16.3 -3.3 10.89 0.66809816
56 36.8 19.2 368.64 10.0173913
10 25.9 -15.9 252.81 9.761003861
Total 0.1 68.8548 = c 2
p-value:
df = ( 3-1) * ( 4 -1) = 6
Using TI-83/84: p-value = c 2 cdf ( 68.8548,1E99,6 ) » 7.02 ´10 -13
Once you are done entering values, click on "compute", you will get the calculation result.
11.1.4
a.) State the null and alternative hypotheses and the level of significance
H o : Educational attainment and age are independent
H A : Educational attainment and age are dependent
a = 0.05
b.) State and check the assumptions for the hypothesis test
i. A random sample of educational attainment and age was taken.
ii. Expected frequencies for each cell are greater than or equal to 5. All expected
frequencies are more than 5.
c.) Find the test statistic and p-value
Test statistic:
O E O-E ( O - E )2 ( O - E )2 E
5416 11541.1 -6125.1 37516850.01 3250.717004
16431 13537.2 2893.8 8374078.44 618.5975268
8555 6843.9 1711.1 2927863.21 427.8062523
9771 8250.9 1520.1 2310704.01 280.0547831
5030 5762.1 -732.1 535970.41 93.01650613
1855 6758.7 -4903.7 24046273.69 3557.825276
5576 3416.9 2159.1 4661712.81 1364.310577
7596 4119.4 3476.6 12086747.56 2934.103889
5777 6389.2 -612.2 374788.84 58.65974457
9435 7494.3 1940.7 3766316.49 502.5574757
3124 3788.8 -664.8 441959.04 116.6488176
3904 4567.7 -663.7 440497.69 96.43752655
7606 6330.0 1276.0 1628176 257.2157978
8795 7424.9 1370.1 1877174.01 252.8214535
2524 3753.7 -1229.7 1512162.09 402.8457495
3109 4525.4 -1416.4 2006188.96 443.317488
13746 7552.7 6193.3 38356964.89 5078.576521
7558 8859.0 -1301.0 1692601 191.0600519
2503 4478.8 -1975.8 3903785.64 871.6141913
2483 5399.5 -2916.5 8505972.25 1575.32591
Total -0.2 22373.5125 = c 2
p-value:
df = ( 5 -1) * ( 4 -1) = 12
Using TI-83/84: p-value = c 2 cdf ( 22373.5125,1E99,12 ) » 0
Using applet as described in # 11.1.2:
http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/
11.2.4
a.) State the null and alternative hypotheses and the level of significance
H o : Deaths from cardiovascular disease in females are in the same proportion as all
deaths for the different age groups
H A : Deaths from cardiovascular disease in females are not in the same proportion as all
deaths for the different age groups
a = 0.05
b.) State and check the assumptions for the hypothesis test
i. A random sample is taken. This is not stated, so it may not be true.
ii. Expected frequencies for each cell are greater than or equal to 5.
c.) Find the test statistic and p-value
Expected
Cardiovascular (O-E)^2 /
Age Expected % Frequency O-E
Frequency (O) E
(E)
5-14 8 0.1 51.3 -43.3 36.55
15-29 16 0.12 61.56 -45.56 33.72
30-49 56 0.26 133.38 -77.38 44.89
50-69 433 0.52 266.76 166.24 103.6
Total 513 218.76
d.) Conclusion
Reject H o since the p-value is less than 0.05.
e.) Interpretation
There is enough evidence to show that deaths from cardiovascular disease in females are
not in the same proportion as all deaths of females for the different age groups.
11.2.6
a.) State the null and alternative hypotheses and the level of significance
H o : Frequencies observed substantiate the claim that the reasons for choosing a car are
equally likely
H A : Frequencies observed substantiate the claim that the reasons for choosing a car are
not equally likely
a = 0.05
b.) State and check the assumptions for the hypothesis test
i. A random sample is taken. This is not stated, so it may not be true.
ii. Expected frequencies for each cell are greater than or equal to 5.
c.) Find the test statistic and p-value
O E O-E ( O - E )2 ( O - E )
2
E
Safety 84 50 34 1156 23.12
Reliability 62 50 12 144 2.88
Cost 46 50 -4 16 0.32
Performance 34 50 16 256 5.12
Comfort 47 50 -3 9 0.18
Looks 27 50 -23 529 10.58
Total 0 c » 42.2
2