Solutions To Coding
Solutions To Coding
VIN-633007415
-
STAT-654
Assignment-1
#1 .
E(i) =. Bl
For simple linear model
,
=>
regression
Yi =
Bo + Bixi + ei
= (yi Yi)
↑
sum
o square residuals
-
2
: SSR
= [Yi -
(B + B xi)]
To determine Boa Bi ,
we have to minimize SSR
GST
...
[yi -(P pixi)] + ( D) 0
-
=
.
- Eyi + EP + piEsi = 0
y =
E
N
- -
In + Bon + Bisn = 0
y BY i
:
Bo = -
Now
,
4
O
[yi -
(B + Bixi)] ( -
xi) = 0
:
E[j-Bi + Bixi-yilxi =
0
Exi-pis Exi +
pi Exi-Exiyi = o
Bi [n22 -Exi2)
-
:N Fy - -
Ex1YI = &
:
B =
nxY -
Exy ;
net-Exi ?
=
Exiyi -
nx
Exi -
ni
*
Exiyi Eyi
= -
2(Xi -
X)2
& sample correlation
coefficient =
R2
-
coefficient of determination
① residuals =
2. =
yi -Y ,
22(yi y, )2 -
E(yi-y)
:
① lotal sum
of square SST :
+ =
E(yi y y y) -
+ -
+ -
Now
,
y =
B + Bixi =
y B -
+ Bixi
.. -
y
=
B(xi x) -
Abo
B Bixi
,
yi-y =
yi
-
Yi
-
Y + Bise-Bix ;
(yi-yi) (yi y) + BY (x xi)
= -
-
=
EB , (xi -
x) .
[Bi(xi x) -
+ Bi (5 x)] -
BYE(xi -
-) "
-
BY(xi 5) -
I
O
vence
,
SST =
E(yi yi) -
+
E(yi - y)2
: 337 = SSE + Ely - y)2
E(y y)
2
where SSR = -
regression sum
of squares
&2 given that MX = 2
My(X(X) =
Bo + Bi( Mx) - +
B2(X-MX)
=
Bo +
Bi(x 2) -
+
B2(x 2)2 -
B2(X = 4)
-
Bo +
Bix -
2B + 4x +
Bo +
BIK-2B , +
B2x2- 4B2X + 4B2
=
Bo-2B ,
+
4B2 +
BIX-YP2X +
B2x2
*
Be = 0 7 .
4 B2 3 2
B1
-
= -
.
Bi 4(0 7) 3 2
-
: -
.
= .
3 2 + 2 8
Bi
- = -
.
.
=
-
0 .
Y
And
:
Bo-2(0 4) -
+ 4(0 7) .
= - 8.5
2.
:
Thus ,
the centered model is
My(X(X)
= -
21 -
0 .
4(X 2) + -
0
.
7(x -
2) .
*Help of ChatGPT was used in learning some part of the code.
Question D
R Code with Outputs
#Question D1
Concentration Strength
1 1.0 6.3
2 1.5 11.1
3 2.0 20.0
4 3.0 24.0
5 4.0 26.1
6 4.5 30.0
Residuals:
Min 1Q Median 3Q Max
-4.6250 -1.6109 0.0413 1.5892 5.0216
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.975562 0.869032 51.754 < 2e-16 ***
poly(Concentration_centered, 3, raw = TRUE)1 4.339394 0.350978 12.364 2.87e-09 ***
poly(Concentration_centered, 3, raw = TRUE)2 -0.548873 0.039199 -14.002 5.11e-10 ***
poly(Concentration_centered, 3, raw = TRUE)3 -0.055188 0.009789 -5.638 4.72e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The adjusted R2 is 0.9648 and the p-value is 1.025e-11. Which is much lower than 0.01 significance level.
Thus, it can be said that the model is statistically significant and can be used to predict the tensile
strength.
#Question D2
# Plot the data and the fitted model
plot(hc$Concentration_centered, y, main="Scatter Plot with Third Order Polynomial Fit", xlab =
"Concentration(centered)", ylab = "Strength", pch=19)
points(hc$Concentration_centered,fitted(hc3), col="red",pch=19)
# Adding the fitted curve
curve(predict(hc3,newdata = data.frame(Concentration_centered=x)), add = TRUE, col="blue",lwd=2)
As seen in the above graph, it appears that the third order polynomial fits the dataset well.
#Question D3
# Residuals vs Fitted values plot
plot(fitted(hc3),resid(hc3), xlab = "Fitted Values", ylab="Residuals", main = "Residuals vs Fitted Values")
abline(h=0, col="red", lty=2)
# Fit a fifth order polynomial model to this data
hc5 <- lm(Strength ~ poly(Concentration_centered, 5, raw=TRUE), data=hc)
#Summary of the model to get the adjusted R-squared and p-value
summary(hc5)
Call:
lm(formula = Strength ~ poly(Concentration_centered, 5, raw = TRUE),
data = hc)
Residuals:
Min 1Q Median 3Q Max
-2.65167 -0.91159 -0.03811 0.96396 2.56865
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.6187788 0.7309210 59.676 < 2e-16 ***
poly(Concentration_centered, 5, raw = TRUE)1 5.3479308 0.3896655 13.724 4.11e-09 ***
poly(Concentration_centered, 5, raw = TRUE)2 -0.1378567 0.1059263 -1.301 0.215700
poly(Concentration_centered, 5, raw = TRUE)3 -0.1630817 0.0289147 -5.640 8.06e-05 ***
poly(Concentration_centered, 5, raw = TRUE)4 -0.0114448 0.0026525 -4.315 0.000840 ***
poly(Concentration_centered, 5, raw = TRUE)5 0.0021978 0.0005163 4.257 0.000935 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The adjusted R-squared is 0.98 which is greated than third order polynomial and the p-value is lower than 0.01 and that of the
third order polynomial.
Question D4
#p-values for the polynomial terms (excluding the intercept)
p_values <- summary_hc5$coefficients[-1, "Pr(>|t|)"]
# Finding the index of the term with the largest p-value (adding one to the index considering the
intercept)
largest_p_value_index <- which.max(p_values) + 1
#Generating the polynomial terms excluding the one with the largest p-value
poly_terms <- paste0("I(Concentration_centered^", 1:5, ")", collapse=" + ")
poly_terms <- strsplit(poly_terms, " + ")[[1]]
# Split into individual terms
poly_terms <- poly_terms[-largest_p_value_index]
# Removing the term with largest p-value
newf <- as.formula(paste("Strength ~ ", paste(poly_terms, collapse=" + ")))
Call:
lm(formula = newf, data = hc)
Residuals:
Min 1Q Median 3Q Max
-2.65167 -0.91159 -0.03811 0.96396 2.56865
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.6187788 0.7309210 59.676 < 2e-16 ***
I(Concentration_centered^1) 5.3479308 0.3896655 13.724 4.11e-09 ***
I(Concentration_centered^2) -0.1378567 0.1059263 -1.301 0.215700
I(Concentration_centered^3) -0.1630817 0.0289147 -5.640 8.06e-05 ***
I(Concentration_centered^4) -0.0114448 0.0026525 -4.315 0.000840 ***
I(Concentration_centered^5) 0.0021978 0.0005163 4.257 0.000935 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Scatterpot of the data with the fitted curve from the final model
plot(hc$Concentration_centered, hc$Strength, main="Scatter Plot with Reduced Model Fit",
xlab="Concentration (Centered)", ylab="Strength", pch=19)
lines(sort(hc$Concentration_centered), predict(hc5_reduced)[order(hc$Concentration_centered)],
col="blue", lwd=2)
The significance after omitting the polynomial term with the highest p value didn’t affect the model.
The R-squared is 0.9847 which is the same.
The scatter plot of 5 degree polynomial visually and statistically fits the dataset better and precisely
compared to the third order polynomial.
Question E
Question E1
which.min(AIC)
[1] 2
which.min(BIC)
[1] 2
Here, the least value of AIC and BIC is seen in second order polynomial. Hence, the same has been taken
for further analysis.
Question E2
summary(lm2)
Call:
lm(formula = y ~ poly(x, 2, raw = TRUE), data = data)
Residuals:
Min 1Q Median 3Q Max
-25.451 -10.027 -1.046 8.201 31.469
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.4693224 5.7627144 1.123 0.27271
poly(x, 2, raw = TRUE)1 0.4545286 0.0658670 6.901 3.89e-07 ***
poly(x, 2, raw = TRUE)2 -0.0004106 0.0001149 -3.573 0.00154 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
plot(lm2$fitted,lm2$res,main="",xlab="fitted",ylab="residuals",pch=19)
abline(h=mean(lm2$res),col="red")
P-value is : 4.062e-13
Question E4
prediction