MATH 1281 Written Assignment Unit 6
MATH 1281 Written Assignment Unit 6
MATH 1281
March 7, 2021
COMPARING TWO SAMPLES AND LINEAR REGRESSION 2
1. Apply the function "plot" to the formula that relates the response "frequency" to the
explanatory variable "march2007" in order to produce the two box-plots of the
response. Redo the plotting with "frequency" replaced by "log(frequency)". The
distribution of the variable "log(frequency)" is:
__ More symmetric, __ Less symmetric compared to the distribution of the variable
"frequency".
Mark the most appropriate option and attach the R code that produces the two
plots:
>
The output boxplot for variable frequency is skewed to the right whereas
that of log(frequency) is symmetric.
2. Mark the null hypotheses that you reject with a significance level of 5% and those
(Reject/Don't Reject) H0: The expectation of "frequency" is the same in the two subsets,
(Reject/Don't Reject) H0: The expectation of "log(frequency)" is the same in the two
subsets.
o Both hypothesis are rejected because from running the codes, their p-
values are less than 0.05. That is, the test for the response of frequency
respectively.
> t.test(frequency~march2007,data=transfusion)
-4.246285 -1.745712
sample estimates:
4.801754 7.797753
> t.test(log(frequency)~march2007,data=transfusion)
-0.6316889 -0.3256582
sample estimates:
1.178089 1.656762
>
3. Mark the null hypotheses that you reject with a significance level of 5% and those
that you do not reject:
(Reject/Don't Reject) H0: The variance of "frequency" is the same in the two subsets,
(Reject/Don't Reject) H0: The variance of "log(frequency)" is the same in the two
subsets.
Explain your answer:
o As shown in the code below, since the p-value of the test for the response
frequency which is less than 2.2e-16 is also less than 0.05, the null
hypothesis is rejected.
o On the other hand, since the p-value of the test for the response
> var.test(frequency~march2007,data=transfusion)
0.2725525 0.4397267
sample estimates:
ratio of variances
0.3488348
> var.test(log(frequency)~march2007,data=transfusion)
0.738272 1.191102
sample estimates:
ratio of variances
0.9449005
>
Linear Regression:
Q4: Apply the function "plot" to the formula that relates the response "frequency" to the
explanatory variable "time" in order to produce the scatter plot. Add the regression line
to the plot. The variability of the variable "frequency, for larger values of the explanatory
variable, is:
__ Smaller, __ Larger, __ Constant.
Mark the most appropriate option and attach the R code that produces the two
plots:
o
COMPARING TWO SAMPLES AND LINEAR REGRESSION 7
20
10
0
0 20 40 60 80 100
time
20
10
0
0 20 40 60 80 100
time
Q5: Mark the null hypotheses that you reject with a significance level of 5% and those
that you do not reject:
(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response
"frequency" is equal to zero,
(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response
"log(frequency)" is equal to zero.
Explain your answer:
o Both hypothesis are rejected because from running the codes, their p-
values are less than 0.05. That is, the test for the slope of time response of
Call:
Residuals:
Coefficients:
COMPARING TWO SAMPLES AND LINEAR REGRESSION 9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
Residuals:
COMPARING TWO SAMPLES AND LINEAR REGRESSION 10
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
An Unpaired Design:
Q6: The 95%-confidence interval of slope of "time" in the regression line of the
response "log(frequency)" is:
COMPARING TWO SAMPLES AND LINEAR REGRESSION 11
2.5 % 97.5 %
>
Q7: The regression line between "time" as an explanatory variable and "log(frequency)"
as a response is:
for the regression line between the two variables. The regression line is
Q8: Apply the function "plot" to the formula that relates the response "frequency" to the
explanatory variable "monetary" in order to produce the scatter plot. Add the regression
line to the plot. The points in the scatter plot are:
__ All on the same line, __ Show a linear trend but are not on the same line, __ Don't
show a linear trend.
Mark the most appropriate option and attach the R code that produces the plot:
(Yakir, 2011). Since the p-value of 0.07721 is greater than 0.05, the null
50
40
30
frequency
20
10
0
monetary
20
10
0
monetary
References
https://my.uopeople.edu/pluginfile.php/1188709/mod_page/content/31/IntroStat.pdf