0% found this document useful (0 votes)
68 views16 pages

Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021

The document contains the results of several statistical tests performed on datasets. A chi-squared test shows proportions of deaths at a hospital differ significantly from population proportions. A second chi-squared test finds insufficient evidence that fatal accident frequency depends on car size. Several other tests examine power failures and their fit to a Poisson distribution, as well as differences in effluent discharges and insurance costs between treatment groups and locations. Interactions between factors are also investigated.

Uploaded by

dan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views16 pages

Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021

The document contains the results of several statistical tests performed on datasets. A chi-squared test shows proportions of deaths at a hospital differ significantly from population proportions. A second chi-squared test finds insufficient evidence that fatal accident frequency depends on car size. Several other tests examine power failures and their fit to a Poisson distribution, as well as differences in effluent discharges and insurance costs between treatment groups and locations. Interactions between factors are also investigated.

Uploaded by

dan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Yadunandan Sharma

500826933
MTH480
Due date: April 15th, 2021
1)
> chisq.test(as.table(c(78,81,28,16,105)), p= c(0.234,0.225,0.056,0.051,0.434))

Chi-squared test for given probabilities


data: as.table(c(78, 81, 28, 16, 105))
X-squared = 15.321, df = 4, p-value = 0.00408

p-value = 0.00408 < 0.05, and null hypothesis that proportions have given theoretical values is
rejected. There is sufficient sample evidence to indicate that the number of deaths at this
hospital differ significantly from the proportion in the population in large.

2)
> B<-matrix(c(67,128,26,63,16,46), nrow = 2, ncol=3)
> B
[,1] [,2] [,3]
[1,] 67 26 16
[2,] 128 63 46
> chisq.test(as.table(B))

Pearson's Chi-squared test


data: as.table(B)
X-squared = 1.8857, df = 2, p-value = 0.3895
Since p-value = 0.3895 > 0.05 null hypothesis of independence is not rejected. There is
insufficient sample evidence to prove that the frequency of fatal accidents is dependent on the
size of the car.
3)
a) > L = (24*0 + 30*1 + 31*2 + 3*11 + 6*4)/ (24 + 30 + 31 + 11 + 6)
>L
[1] 1.460784
>
b) > r = ((0*24+1*30+2*31+3*11+4*6)/102)
>
> p = c (dpois(0,r), dpois(1,r), dpois(2,r), dpois(3,r),
+ (1-(dpois(0,r)+ dpois(1,r)+ dpois(2,r)+ dpois(3,r))))
>p
[1] 0.23205420 0.33898114 0.24758916 0.12055812 0.06081738 (vector p)

c)
0.23205420 * 102 (#expected frequency of the 1st cell)
[1] 23.66953

> 0.33898114 * 102 (#expected frequency of the 2nd cell)


[1] 34.57608

> 0.24758916 * 102 (#expected frequency of the 3rd cell)


[1] 25.25409

> 0.12055812*102 (#expected frequency of the 4th cell)


[1] 12.29693

> 0.06081738 * 102 (#expected frequency of the 5th cell)


[1] 6.203373
*none of the values are less than 5 hence there is no need to combine any of the cells.

d) > chisq.test(as.table(c(24,30,31,11,6)),
+ p = c (0.23205420, 0.33898114, 0.24758916, 0.12055812, 0.06081738))

Chi-squared test for given probabilities

data: as.table(c(24, 30, 31, 11, 6))


X-squared = 2.061, df = 4, p-value = 0.7245
p-value = 0.7245>0.05, and null hypothesis is not rejected.
There is insufficient sample evidence to conclude that the number of power failures do not
follow the Poisson distribution with the parameter λ=1.46.
Assignment 2

1) a)
> PE<-c(1.65,1.72,1.50,1.37,1.60,1.70,1.85,1.46,2.05,1.80,1.40,
1.75,1.38,1.65,1.55,2.10,1.95,1.65,1.88,2.00)
> Plant<-c(rep("T1",5),rep("T2",5),rep("T3",5),rep("T4",5))
> Plant<-factor(Plant)
> aovobject<- aov(PE~Plant)
> anova(aovobject)

Analysis of Variance Table

Response: PE
Df Sum Sq Mean Sq F value Pr(>F)
Plant 3 0.4649 0.15496 5.2002 0.01068 *
Residuals 16 0.4768 0.02980
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since the p-value of 0. 01068 is smaller than 0.05, we reject null hypothesis at 0.05 significance
level. This means that the mean weight of effluents discharges are significantly different for at
least one pair.

b)
> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = PE ~ Plant)


$Plant

Diff Lwr Upr P adj


T2-T1 0.204 -0.10836258 0.51636258 0.2796961
T3-T1 -0.022 -0.33436258 0.29036258 0.9969809
T4-T1 0.348 0.03563742 0.66036258 0.0264399
T3-T2 -0.226 -0.53836258 0.08636258 0.2047920
T4-T2 0.144 -0.16836258 0.45636258 0.5647642
T4-T3 0.370 0.05763742 0.68236258 0.0176858

T4-T1 < 0.05 and T4-T3 < 0.05, T4>T1 and T4>T3

Mu 1 Mu 3 Mu 4 Mu 2 (the ones that are underlined are not statistically different

2)
> Cost<-c(736,836,1492,996,745,725,1384,884,668,618,1214,
802,1065,869,1502,1571,1202,1172,1682,1272)
> company <-factor(c(rep("T1",4),rep("T2",4),rep("T3",4),rep("T4",4), rep("T5",4)))
> Location<- factor(c( 1:4,1:4,1:4,1:4,1:4))
> aovobject<- aov(Cost~company+Location)
> anova(aovobject)
Analysis of Variance Table

Response: Cost
Df Sum Sq Mean Sq F value Pr(>F)
company 4 731309 182827 12.204 0.0003432 ***
Location 3 1176270 392090 26.173 1.499e-05 ***
Residuals 12 179769 14981
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since p value for company is 0.00003432 which is less than 0.05, hence we can reject the null
and there is enough evidence to prove that the cost of insurance varies amongst companies.

> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = Cost ~ company + Location)

$company
diff lwr upr p adj
T2-T1 -80.50 -356.36224 195.36224 0.8798100
T3-T1 -189.50 -465.36224 86.36224 0.2472482
T4-T1 236.75 -39.11224 512.61224 0.1062787
T5-T1 317.00 41.13776 592.86224 0.0221540
T3-T2 -109.00 -384.86224 166.86224 0.7190153
T4-T2 317.25 41.38776 593.11224 0.0220443
T5-T2 397.50 121.63776 673.36224 0.0045549
T4-T3 426.25 150.38776 702.11224 0.0026313
T5-T3 506.50 230.63776 782.36224 0.0006076
T5-T4 80.25 -195.61224 356.11224 0.8809555

b) T2-T1, T3-T1, T4-T1, T3-T2, T5-T4 <0.05, T5>T1, T4>T2, T5>T2, T4>T3, T5>T3
Mu5 Mu4 Mu3 Mu2 Mu1 (the ones that are underlined are not statistically different)
$Location
diff lwr upr p adj
2-1 -39.2 -269.022225 190.6222 0.9560003
3-1 571.6 341.777775 801.4222 0.0000435
4-1 221.8 -8.022225 451.6222 0.0597573
3-2 610.8 380.977775 840.6222 0.0000224
4-2 261.0 31.177775 490.8222 0.0247886
4-3 -349.8 -579.622225 -119.9778 0.0033905

c) Since p-value for location is 1.499e-05 much less than 0.05 hence we can reject the null and
there is enough evidence to prove that the cost of insurance varies amongst locations.

3)
a) Completely randomized design. Two factors: cost and supplier. Cost has two levels and
supplier has 4 levels.

b)
> Rating<-c(76,74,69,74,69,68,72,71,71,75,73,73,69,67,64,69,64,60,71,71,70,72,71,70)
> Cost<-factor(c(rep("T1",12), rep("T2",12)))
> Supplier<- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,1,1,1,2,2,2,3,3,3,4,4,4))
> aovobject<-aov(Rating~Cost*Supplier)
> anova(aovobject)

Analysis of Variance Table

Response: Rating
Df Sum Sq Mean Sq F value Pr(>F)
Cost 1 92.042 92.042 13.8931 0.001833 **
Supplier 3 81.125 27.042 4.0818 0.024902 *
Cost:Supplier 3 33.458 11.153 1.6834 0.210531
Residuals 16 106.000 6.625
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

c) Since p value of 0.2105 is greater than 0.05, we do not reject the null hypothesis of no
interaction at 0.05 significance level. Which means that there is insufficient sample evidence to
prove that there is interaction between factors cost and supplier.
d) There is a slight effect due to the supplier.
e) Yes, there is a slight effect due to the cost.

> TukeyHSD(aovobject)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = Rating ~ Cost * Supplier)

$Cost
diff lwr upr p adj
T2-T1 -3.916667 -6.144249 -1.689084 0.0018334

$Supplier
diff lwr upr p adj
2-1 -2.500000 -6.7516077 1.751608 0.3643388
3-1 1.166667 -3.0849410 5.418274 0.8601187
4-1 2.500000 -1.7516077 6.751608 0.3643388
3-2 3.666667 -0.5849410 7.918274 0.1039730
4-2 5.000000 0.7483923 9.251608 0.0185696
4-3 1.333333 -2.9182743 5.584941 0.8063964
*There is a slight difference between supplier 2(B) and 4(D)

$`Cost:Supplier`
diff lwr upr p adj
T2:1-T1:1 -6.3333333 -13.6093432 0.9426765 0.1129895
T1:2-T1:1 -2.6666667 -9.9426765 4.6093432 0.8976115
T2:2-T1:1 -8.6666667 -15.9426765 -1.3906568 0.0140595
T1:3-T1:1 -1.6666667 -8.9426765 5.6093432 0.9911124
T2:3-T1:1 -2.3333333 -9.6093432 4.9426765 0.9451981
T1:4-T1:1 0.6666667 -6.6093432 7.9426765 0.9999756
T2:4-T1:1 -2.0000000 -9.2760099 5.2760099 0.9752592
T1:2-T2:1 3.6666667 -3.6093432 10.9426765 0.6614247
T2:2-T2:1 -2.3333333 -9.6093432 4.9426765 0.9451981
T1:3-T2:1 4.6666667 -2.6093432 11.9426765 0.3896498
T2:3-T2:1 4.0000000 -3.2760099 11.2760099 0.5673822
T1:4-T2:1 7.0000000 -0.2760099 14.2760099 0.0638225
T2:4-T2:1 4.3333333 -2.9426765 11.6093432 0.4752993
T2:2-T1:2 -6.0000000 -13.2760099 1.2760099 0.1484555
T1:3-T1:2 1.0000000 -6.2760099 8.2760099 0.9996326
T2:3-T1:2 0.3333333 -6.9426765 7.6093432 0.9999998
T1:4-T1:2 3.3333333 -3.9426765 10.6093432 0.7517157
T2:4-T1:2 0.6666667 -6.6093432 7.9426765 0.9999756
T1:3-T2:2 7.0000000 -0.2760099 14.2760099 0.0638225
T2:3-T2:2 6.3333333 -0.9426765 13.6093432 0.1129895
T1:4-T2:2 9.3333333 2.0573235 16.6093432 0.0075749
T2:4-T2:2 6.6666667 -0.6093432 13.9426765 0.0852319
T2:3-T1:3 -0.6666667 -7.9426765 6.6093432 0.9999756
T1:4-T1:3 2.3333333 -4.9426765 9.6093432 0.9451981
T2:4-T1:3 -0.3333333 -7.6093432 6.9426765 0.9999998
T1:4-T2:3 3.0000000 -4.2760099 10.2760099 0.8321663
T2:4-T2:3 0.3333333 -6.9426765 7.6093432 0.9999998
T2:4-T1:4 -2.6666667 -9.9426765 4.6093432 0.8976115

T1:4-T2:2 < 0.05 and T2:2-T1:1 < 0.05


There is slight difference between T1:4 and T1:1 with T2:2, hence shows that the difference is
amongst the cost and not suppliers.
Assignment 3
a)
> testscore= c(39,43,21,64,57,47,28,75,34,52)
> examgrade = c(65, 78, 52, 82, 92, 89, 73, 98, 56, 75)
> gradeversusscore <-lm(examgrade~testscore)
> summary (gradeversusscore)

Call:
lm(formula = examgrade ~ testscore)

Residuals:
Min 1Q Median 3Q Max
-10.813 -5.629 -2.531 6.758 12.234

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.7842 8.5069 4.794 0.00137 **
testscore 0.7656 0.1750 4.375 0.00236 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.704 on 8 degrees of freedom


Multiple R-squared: 0.7052, Adjusted R-squared: 0.6684
F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365
> anova(gradeversusscore)
Analysis of Variance Table
Response: examgrade
Df Sum Sq Mean Sq F value Pr(>F)
testscore 1 1449.97 1449.97 19.141 0.002365 **
Residuals 8 606.03 75.75
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> summary(gradeversusscore)

Call:
lm(formula = examgrade ~ testscore)

Residuals:
Min 1Q Median 3Q Max
-10.813 -5.629 -2.531 6.758 12.234

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.7842 8.5069 4.794 0.00137 **
testscore 0.7656 0.1750 4.375 0.00236 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.704 on 8 degrees of freedom


Multiple R-squared: 0.7052, Adjusted R-squared: 0.6684
F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365
> plot(testscore,examgrade)
> abline(lm(examgrade ~ testscore), col='red')
> 90
80
examgrade

70
60

20 30 40 50 60 70

testscore

b) the estimated line has an equation of 0.7656x + 40.7842.

c) Yes, the data contributes important information for the prediction of the final grade.
Where the p-value is 0.002365 much smaller than 0.05, hence rejecting the null hypothesis.
Therefore, there is enough data to suggest that
d) > confint(gradeversusscore)
2.5 % 97.5 %
(Intercept) 21.1672977 60.401013
testscore 0.3620458 1.169078
From the output it is visible that the 95% confidence interval for the slope is 0.3620 to 1.169

e) model <- lm(examgrade ~ testscore)


residual <- resid(model)
plot(fitted(model), res,
main = 'residual vs fit')
abline(0,0)
*It can be concluded that the conditions of constant variance are violated.
residual vs fit
10
5
res

0
-5
-10

60 70 80 90

fitted(model)
f) model = lm(examgrade ~ testscore)
model.m = rstandard(model)

qqnorm(model.m,ylab="Standardized Residuals", xlab="Normal Scores", main="Normal prob")


qqline(model.m)
abline(0,0)
* It can be concluded that the residuals are normally distributed at least approximately.

You might also like