Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
500826933
MTH480
Due date: April 15th, 2021
1)
> chisq.test(as.table(c(78,81,28,16,105)), p= c(0.234,0.225,0.056,0.051,0.434))
p-value = 0.00408 < 0.05, and null hypothesis that proportions have given theoretical values is
rejected. There is sufficient sample evidence to indicate that the number of deaths at this
hospital differ significantly from the proportion in the population in large.
2)
> B<-matrix(c(67,128,26,63,16,46), nrow = 2, ncol=3)
> B
[,1] [,2] [,3]
[1,] 67 26 16
[2,] 128 63 46
> chisq.test(as.table(B))
c)
0.23205420 * 102 (#expected frequency of the 1st cell)
[1] 23.66953
d) > chisq.test(as.table(c(24,30,31,11,6)),
+ p = c (0.23205420, 0.33898114, 0.24758916, 0.12055812, 0.06081738))
1) a)
> PE<-c(1.65,1.72,1.50,1.37,1.60,1.70,1.85,1.46,2.05,1.80,1.40,
1.75,1.38,1.65,1.55,2.10,1.95,1.65,1.88,2.00)
> Plant<-c(rep("T1",5),rep("T2",5),rep("T3",5),rep("T4",5))
> Plant<-factor(Plant)
> aovobject<- aov(PE~Plant)
> anova(aovobject)
Response: PE
Df Sum Sq Mean Sq F value Pr(>F)
Plant 3 0.4649 0.15496 5.2002 0.01068 *
Residuals 16 0.4768 0.02980
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since the p-value of 0. 01068 is smaller than 0.05, we reject null hypothesis at 0.05 significance
level. This means that the mean weight of effluents discharges are significantly different for at
least one pair.
b)
> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level
T4-T1 < 0.05 and T4-T3 < 0.05, T4>T1 and T4>T3
2)
> Cost<-c(736,836,1492,996,745,725,1384,884,668,618,1214,
802,1065,869,1502,1571,1202,1172,1682,1272)
> company <-factor(c(rep("T1",4),rep("T2",4),rep("T3",4),rep("T4",4), rep("T5",4)))
> Location<- factor(c( 1:4,1:4,1:4,1:4,1:4))
> aovobject<- aov(Cost~company+Location)
> anova(aovobject)
Analysis of Variance Table
Response: Cost
Df Sum Sq Mean Sq F value Pr(>F)
company 4 731309 182827 12.204 0.0003432 ***
Location 3 1176270 392090 26.173 1.499e-05 ***
Residuals 12 179769 14981
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since p value for company is 0.00003432 which is less than 0.05, hence we can reject the null
and there is enough evidence to prove that the cost of insurance varies amongst companies.
> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level
$company
diff lwr upr p adj
T2-T1 -80.50 -356.36224 195.36224 0.8798100
T3-T1 -189.50 -465.36224 86.36224 0.2472482
T4-T1 236.75 -39.11224 512.61224 0.1062787
T5-T1 317.00 41.13776 592.86224 0.0221540
T3-T2 -109.00 -384.86224 166.86224 0.7190153
T4-T2 317.25 41.38776 593.11224 0.0220443
T5-T2 397.50 121.63776 673.36224 0.0045549
T4-T3 426.25 150.38776 702.11224 0.0026313
T5-T3 506.50 230.63776 782.36224 0.0006076
T5-T4 80.25 -195.61224 356.11224 0.8809555
b) T2-T1, T3-T1, T4-T1, T3-T2, T5-T4 <0.05, T5>T1, T4>T2, T5>T2, T4>T3, T5>T3
Mu5 Mu4 Mu3 Mu2 Mu1 (the ones that are underlined are not statistically different)
$Location
diff lwr upr p adj
2-1 -39.2 -269.022225 190.6222 0.9560003
3-1 571.6 341.777775 801.4222 0.0000435
4-1 221.8 -8.022225 451.6222 0.0597573
3-2 610.8 380.977775 840.6222 0.0000224
4-2 261.0 31.177775 490.8222 0.0247886
4-3 -349.8 -579.622225 -119.9778 0.0033905
c) Since p-value for location is 1.499e-05 much less than 0.05 hence we can reject the null and
there is enough evidence to prove that the cost of insurance varies amongst locations.
3)
a) Completely randomized design. Two factors: cost and supplier. Cost has two levels and
supplier has 4 levels.
b)
> Rating<-c(76,74,69,74,69,68,72,71,71,75,73,73,69,67,64,69,64,60,71,71,70,72,71,70)
> Cost<-factor(c(rep("T1",12), rep("T2",12)))
> Supplier<- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,1,1,1,2,2,2,3,3,3,4,4,4))
> aovobject<-aov(Rating~Cost*Supplier)
> anova(aovobject)
Response: Rating
Df Sum Sq Mean Sq F value Pr(>F)
Cost 1 92.042 92.042 13.8931 0.001833 **
Supplier 3 81.125 27.042 4.0818 0.024902 *
Cost:Supplier 3 33.458 11.153 1.6834 0.210531
Residuals 16 106.000 6.625
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
c) Since p value of 0.2105 is greater than 0.05, we do not reject the null hypothesis of no
interaction at 0.05 significance level. Which means that there is insufficient sample evidence to
prove that there is interaction between factors cost and supplier.
d) There is a slight effect due to the supplier.
e) Yes, there is a slight effect due to the cost.
> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level
$Cost
diff lwr upr p adj
T2-T1 -3.916667 -6.144249 -1.689084 0.0018334
$Supplier
diff lwr upr p adj
2-1 -2.500000 -6.7516077 1.751608 0.3643388
3-1 1.166667 -3.0849410 5.418274 0.8601187
4-1 2.500000 -1.7516077 6.751608 0.3643388
3-2 3.666667 -0.5849410 7.918274 0.1039730
4-2 5.000000 0.7483923 9.251608 0.0185696
4-3 1.333333 -2.9182743 5.584941 0.8063964
*There is a slight difference between supplier 2(B) and 4(D)
$`Cost:Supplier`
diff lwr upr p adj
T2:1-T1:1 -6.3333333 -13.6093432 0.9426765 0.1129895
T1:2-T1:1 -2.6666667 -9.9426765 4.6093432 0.8976115
T2:2-T1:1 -8.6666667 -15.9426765 -1.3906568 0.0140595
T1:3-T1:1 -1.6666667 -8.9426765 5.6093432 0.9911124
T2:3-T1:1 -2.3333333 -9.6093432 4.9426765 0.9451981
T1:4-T1:1 0.6666667 -6.6093432 7.9426765 0.9999756
T2:4-T1:1 -2.0000000 -9.2760099 5.2760099 0.9752592
T1:2-T2:1 3.6666667 -3.6093432 10.9426765 0.6614247
T2:2-T2:1 -2.3333333 -9.6093432 4.9426765 0.9451981
T1:3-T2:1 4.6666667 -2.6093432 11.9426765 0.3896498
T2:3-T2:1 4.0000000 -3.2760099 11.2760099 0.5673822
T1:4-T2:1 7.0000000 -0.2760099 14.2760099 0.0638225
T2:4-T2:1 4.3333333 -2.9426765 11.6093432 0.4752993
T2:2-T1:2 -6.0000000 -13.2760099 1.2760099 0.1484555
T1:3-T1:2 1.0000000 -6.2760099 8.2760099 0.9996326
T2:3-T1:2 0.3333333 -6.9426765 7.6093432 0.9999998
T1:4-T1:2 3.3333333 -3.9426765 10.6093432 0.7517157
T2:4-T1:2 0.6666667 -6.6093432 7.9426765 0.9999756
T1:3-T2:2 7.0000000 -0.2760099 14.2760099 0.0638225
T2:3-T2:2 6.3333333 -0.9426765 13.6093432 0.1129895
T1:4-T2:2 9.3333333 2.0573235 16.6093432 0.0075749
T2:4-T2:2 6.6666667 -0.6093432 13.9426765 0.0852319
T2:3-T1:3 -0.6666667 -7.9426765 6.6093432 0.9999756
T1:4-T1:3 2.3333333 -4.9426765 9.6093432 0.9451981
T2:4-T1:3 -0.3333333 -7.6093432 6.9426765 0.9999998
T1:4-T2:3 3.0000000 -4.2760099 10.2760099 0.8321663
T2:4-T2:3 0.3333333 -6.9426765 7.6093432 0.9999998
T2:4-T1:4 -2.6666667 -9.9426765 4.6093432 0.8976115
Call:
lm(formula = examgrade ~ testscore)
Residuals:
Min 1Q Median 3Q Max
-10.813 -5.629 -2.531 6.758 12.234
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.7842 8.5069 4.794 0.00137 **
testscore 0.7656 0.1750 4.375 0.00236 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(gradeversusscore)
Call:
lm(formula = examgrade ~ testscore)
Residuals:
Min 1Q Median 3Q Max
-10.813 -5.629 -2.531 6.758 12.234
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.7842 8.5069 4.794 0.00137 **
testscore 0.7656 0.1750 4.375 0.00236 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
70
60
20 30 40 50 60 70
testscore
c) Yes, the data contributes important information for the prediction of the final grade.
Where the p-value is 0.002365 much smaller than 0.05, hence rejecting the null hypothesis.
Therefore, there is enough data to suggest that
d) > confint(gradeversusscore)
2.5 % 97.5 %
(Intercept) 21.1672977 60.401013
testscore 0.3620458 1.169078
From the output it is visible that the 95% confidence interval for the slope is 0.3620 to 1.169
0
-5
-10
60 70 80 90
fitted(model)
f) model = lm(examgrade ~ testscore)
model.m = rstandard(model)