Midterm2
Midterm2
Note A. For questions of hypothesis testing, you need to 1. Explain your parameters
and specify the hypotheses about the parameters. 2. Choose the right testing
statistics, write down its formula and specify its null distribution. 3. Calculate the
critical value for your rejection region. You should clearly specify the degrees of
freedom of the quantiles when it is about t, F and 𝜒 2 distribution. 4. Make your
statistical conclusion. 5. Make the practical conclusion.
Note B. Please conduct hypothesis testing with significance level 0.05 if not
otherwise specified.
Note C. Please use the average value whenever interpolation is needed. For example,
use 0.845 as the 80th percentile of N(0,1).
Note D. For any calculation with decimal numbers, three effective decimal places are
good enough.
Note E. Please make distribution assumptions for your random variables when
needed.
Note F. For quantiles of t-distribution, if the degree of freedom is greater than 30,
please use the normal quantile as approximation. You should first specify the correct
t-quantiles with the right degree of freedom and then declare the quantile you will
adopt in the normal approximation.
1. (5 points) Does a high value of R2 imply that two variables are causally related?
Why or why not?
2. (5 points) A residual plot for a simple linear regression is as follows. What are the
problems that can be detected from the plot?
(a) Misspecification of the mean model (b) Heteroscedasticity (c) Deviation from
normal distribution
3. To investigate the relationship between the car’s mileage and the sales price for a
2007 model year Camry, the following data show the mileage and sale price for
19 sales. The scatter plot below suggests a linear relationship between miles and
prices.
The simple linear regression model fitted for the above data with Miles as
explanatory variable and the Price as the response resulted in the following
outputs.
R2=0.5387
ANOVA table:
Df Sum Sq Mean Sq F value Pr(>F)
Miles 47.158 (c) 0.000348
Residuals (a) (b)
Coefficient table:
To test if the mean satisfaction scores are all the same across the four job types,
we derive the one-way ANOVA table as follows.
Df SS MS F Pr(>F)
Treatment (a) 4.8661 0.006081
Residuals 4782.6 (c)
Total (b)
(a) –(c) (15 points) Please fill in the cells in the ANOVA table.
(d) (5 points) What are the hypotheses (H0, H1) tested with the above F-statistic?
Please specify the model and the parameter in your hypotheses.
(e) (5 points) With the significant result above, we need to test which two jobs
have significant difference of satisfaction scores. How to adjust the
significance level for each comparison with Bonferroni’s correction?
5. A factorial experiment was designed to test for any significant differences in the
time needed to perform English to foreign language translations with two
computerized language translators. Because the type of language translated was
also considered a significant factor, translations were made with both systems for
three different languages: Spanish, French, and German.
Time Language
(hour) Spanish French German
8 10 12
System1 12 14 16
10 12 14
6 14 16
System2 10 16 22
8 15 19
(a) (7 points) Please make an interaction plot between the system factor and the
language factor. Is there any interaction effect? Please explain your answer.
(b) –(c) (15 points) The following table is a two-way ANOVA table for the analysis.
Please fill in the cells in the table.
Df Sum Sq Mean Sq F value Pr(>F)
language 85.5 0.000161
system 18 0.064206
interaction 4.5 0.034815
Residuals (b) 4.333
Total (c)
(d) (3 points) Please make conclusions with respect to all the main effects and
interaction effect.
6. (5 points) What are the assumptions made for the one-way analysis of variance?
Appendix
Margin of error:
1 (𝑥𝑔 −𝑥̅ )2
𝑡𝑛−2, 𝛼/2 𝑠𝜀 √𝑛 + (𝑛−1)𝑠2 ,
𝑥
1 (𝑥𝑔 −𝑥̅ )2
𝑡𝑛−2, 𝛼/2 𝑠𝜀 √1 + 𝑛 + (𝑛−1)𝑠2 .
𝑥