AYUSHI PATEL
A044
40525200045
R Software – Internal Assignment (Semester 2)
1. Regression line of Sales on Advertisement & Competition
i. > sales=c(27,23,31,45,47,42,39,45,57,59,73,84)
> adv=c(20,20,25,28,29,28,31,34,35,36,41,45)
> comp=c(10,15,15,15,20,25,35,35,20,30,20,20)
> reg =lm(sales~adv+comp)
> reg Call:
lm(formula = sales ~ adv + comp)
Coefficients:
(Intercept) adv comp
-18.7958 2.5248 -0.5449
ii. > names(reg)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
> fitted.values(reg)
1 2 3 4 5 6 7 8
26.25119 23.52657 36.15062 43.72506 43.52525
38.27582 40.40102 47.97545
9 10 11 12
58.67412 55.74969 73.82298 83.92222
> residuals(reg)
1 2 3 4 5 6
0.74881225 -0.52657047 -5.15062467 1.27494281
3.47474925 3.72417737
AYUSHI PATEL
A044
40525200045
7 8 9 10 11 12
-1.40102059 -2.97545311 -1.67411578 3.25030794
0.82298082 0.07777582
iii. > summary(reg)
Call:
lm(formula = sales ~ adv + comp)
Residuals:
Min 1Q Median 3Q Max
-5.1506 -1.4693 -0.2244 1.7688 3.7242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.7958 3.8520 -4.880 0.000872 *** adv
2.5248 0.1295 19.495 1.14e-08 ***
comp -0.5449 0.1230 -4.432 0.001643 ** ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.978 on 9 degrees of freedom
Multiple R-squared: 0.9779, Adjusted R-squared: 0.973
F-statistic: 199.2 on 2 and 9 DF, p-value: 3.539e-08
R2 = 0.9779 i.e. 97%, it shows how good the model is. The
independent variables (advertisement and competition) has
97% affect on the dependent variable (sales).
Adjusted R2 = 0.973 i.e. 97%, this is a better representation
for goodness of fit. It is almost same as above which
means the model is good.
iv. VIF = > install.packages("car")
AYUSHI PATEL
A044
40525200045
> library(car)
Loading required package: carData Warning
messages:
1: package ‘car’ was built under R version 4.0.5 2:
package ‘carData’ was built under R version 4.0.3
> VIF=vif(reg) >
VIF
adv comp
1.221978 1.221978
Variation Inflation Factor (VIF) measures
multicollinearity, for advertisement and competition
VIF=1.22 which is negligible. Therefore we conclude that
there is no multicollinearity.
2. Regression of points scored on pass yards, rush yards, and points
allowed
v. >ptsscored=c(38,42,38,27,30,40,45,30,37,26,51,40,27,28,3
1,35)
>passyard=c(256,326,314,304,313,352,358,303,375,249,4
78,295,377,243,273,281)
>rushyards=c(106,127,77,142,126,94,198,49,139,118,98,1
74,94,60,154,99)
>ptsallowed=c(28,37,27,23,14,43,10,23,21,14,54,33,30,22
,22,18)
> reg1=lm(ptsscored~passyard+rushyards+ptsallowed)
> reg1
AYUSHI PATEL
A044
40525200045
Call:
lm(formula = ptsscored ~ passyard + rushyards +
ptsallowed)
Coefficients:
(Intercept) passyard rushyards ptsallowed
7.37778 0.03859 0.06653 0.30279
vi. > names(reg1)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
> fitted.values(reg1)
1 2 3 4 5 6 7 8
32.78613 39.60938 32.79190 35.51943 32.07711
40.23380 37.39258 29.29342
9 10 11 12 13 14 15 16
37.45385 29.07537 48.69239 40.32905 37.26220
27.40734 34.81887 30.25717
> residuals(reg1)
1 2 3 4 5 6
5.2138729 2.3906192 5.2081010 -8.5194349
2.0771111 -0.2337994
7 8 9 10 11 12
7.6074176 0.7065770 -0.4538513 -3.0753728
2.3076051 -0.3290500
13 14 15 16
-10.2621987 0.5926626 -3.8188669 4.7428298
vii. > summary(reg1)
AYUSHI PATEL
A044
40525200045
Call:
lm(formula = ptsscored ~ passyard + rushyards +
ptsallowed)
Residuals:
Min 1Q Median 3Q Max
-10.2622 -2.3267 0.1794 2.9787 7.6074
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept)
7.37778 8.35058 0.884 0.3943 passyard
0.03859 0.02993 1.289 0.2216 rushyards
0.06653 0.03796 1.753 0.1051 ptsallowed
0.30279 0.16220 1.867 0.0866 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.425 on 12 degrees of freedom
Multiple R-squared: 0.5582, Adjusted R-squared:
0.4478
F-statistic: 5.054 on 3 and 12 DF, p-value: 0.01717
R2 = 0.5582 i.e. 55% it shows goodness of fit. The
independent variables only affect the dependent variable
by approx. 55% which is decent.
Adjusted R2 = 0.4478 i.e. 44% a better representation of
how good the model is. Since it is 44% the model is not
that good.
viii. VIF = > library(car)
> VIF=vif(reg1) >
VIF
AYUSHI PATEL
A044
40525200045
passyard rushyards ptsallowed
1.636215 1.164668 1.739894 VIF
measures multicollinearity, in passyard,
rushyard, and points allowed the VIF is
1.63, 1.16, and 1.73 which is very less.
Thus we conclude there is no
multicollinearity.
3. Ho = 310 HA≠ 310 , two tailed test at 5% alpha
> n=25 ; xbar=300 ; mu=310 ; sd=18.5
> SE=sd/sqrt(n-1)
> SE
[1] 3.776297
> tcal=(xbar-mu)/SE
> tcal
[1] -2.648097
> PV=2*pt(abs(tcal),df=n-1,lower.tail=F)
> PV
[1] 0.01408115
Since p-value is greater than alpha hence we will accept the null
hypothesis and conclude that the mean weight of certain species
of turtle is equal to 310 pounds.
4. Weight of men = mu1
Weight of women = mu2
Ho=mu1=mu2 HA=mu1>mu2, paired t test at 1% alpha level
> wtmales=c(67.8,60,63.4,76,89.4,73.3,67.3,61.3,62.4)
> wtfemales=c(38.9,61.2,73.3,21.8,63.4,64.6,48.4,48.8,48.5)
AYUSHI PATEL
A044
40525200045
>
pttest=t.test(wtmales,wtfemales,mu=0,paired=T,alter='greater')
> pttest
Paired t-test
data: wtmales and wtfemales t = 2.726, df = 8, p-value = 0.013
alternative hypothesis: true difference in means is greater than
0 95 percent confidence interval:
5.368299 Inf
sample estimates:
mean of the differences
16.88889
Since our p-value is greater than alpha 1% we will accept null
hypothesis and conclude that there is no significant difference
between weights of men and women.
5. Ho= there is no significant difference between strength 1,2,3,4
of raw materials
HA= at least one of the strength of raw material is different
>st1=c(11.715501,11.981569,8.0439292,10.55816,14.079463,1
0.776867,7.8602695,11.889672,11.942314,13.177454)
>st2=c(10.566155,13.455359,7.4188405,12.031314,7.7766332,
10.748939,10.72698,4.4772914,6.8038204,5.3718922)
>st3=c(10.283346,12.177732,10.559808,9.6551865,8.7902748,
10.862457,10.378184,10.188052,11.62452,12.305905)
>st4=c(6.903486,8.9901103,6.9712734,9.1603896,8.6784264,1
1.443832,10.780441,5.66676,10.776041,9.0087649)
AYUSHI PATEL
A044
40525200045
> d=stack(list(b1=st1,b2=st2,b3=st3,b4=st4)
+)
> names(d)
[1] "values" "ind"
> av1=aov(values~ind,data=d)
> av1
> summary(av1)
Df Sum Sq Mean Sq F value Pr(>F) ind
3 43.62 14.540 3.303 0.0311 *
Residuals 36 158.47 4.402
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
At alpha 5% we reject the null hypothesis because the p-value
0.03 is less than 0.05, and conclude that at least one of the
strength is different.