0% found this document useful (0 votes)
12 views5 pages

Exercice V

The document outlines the process of calculating the least squares regression line using R for predicting final exam scores based on ACT Mathematics scores. It includes the regression equation, model summary, and a plot of the data points with the regression line, indicating a weak relationship between the variables. Additionally, it discusses the residual plot and provides a 95% confidence interval for the slope, concluding that there is no strong evidence of a significant relationship at the 95% confidence level.

Uploaded by

Aicha Mattouhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Exercice V

The document outlines the process of calculating the least squares regression line using R for predicting final exam scores based on ACT Mathematics scores. It includes the regression equation, model summary, and a plot of the data points with the regression line, indicating a weak relationship between the variables. Additionally, it discusses the residual plot and provides a 95% confidence interval for the slope, concluding that there is no strong evidence of a significant relationship at the 95% confidence level.

Uploaded by

Aicha Mattouhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Calculate the Least Squares Regression Line


for These Data
In this section, we calculate the least squares regression line for the data. We use the lm()
function in R to fit a linear model that predicts the final exam score (( y )) based on the ACT
Mathematics score (( x )). The equation of the regression line is of the form:
y i=β 0 + β 1 x i

Where:

• ($ \beta_0 $) is the intercept (the value of ( y ) when ( x = 0 )),


• ($ \beta_1 $\) is the slope (the change in \($ y $) for a one-unit change in ( x )).

The summary(model) function provides the coefficients for the regression line.

# Data
x <- c(25, 20, 26, 26, 28, 28, 29, 32, 20, 25, 26, 28, 25, 31, 30)
y <- c(138, 84, 104, 112, 88, 132, 90, 183, 100, 143, 141, 161, 124,
118, 168)
data <- data.frame(x, y)

# Fit regression model


model <- lm(y ~ x, data = data)
summary(model)

# Regression line
beta_0 <- coef(model)[1]
beta_1 <- coef(model)[2]
cat("/////////////////////////\n")
cat("Regression line: y =", beta_0, "+", beta_1, "x\n")

Call:
lm(formula = y ~ x, data = data)

Residuals:
Min 1Q Median 3Q Max
-46.493 -15.593 3.856 21.940 33.057

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.478 55.952 0.116 0.9096
x 4.483 2.087 2.148 0.0511 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27.02 on 13 degrees of freedom
Multiple R-squared: 0.262, Adjusted R-squared: 0.2052
F-statistic: 4.615 on 1 and 13 DF, p-value: 0.05113

/////////////////////////
Regression line: y = 6.477725 + 4.483294 x

2. Plot the points and the least squares


regression line on the same graph.
Here, we plot the data points along with the regression line. The data points are shown in blue,
and the regression line is plotted in red. We can see that most of the blue points, which
represent our actual data, are significantly far from the regression line. From the summary of
our model, we observe a large p-value associated with the explanatory variable ( x ) (ACT
Mathematics score), which is 0.9096. This can suggest that the variable ( x ) does not explain
much of the variability in the ( y ) variable (final exam score).

# Plot data and regression line


plot(data$x, data$y, main = "ACT vs Final Exam Scores",
xlab = "ACT Mathematics Score", ylab = "Final Exam Score",
pch = 19, col = "blue")
abline(model, col = "red", lwd = 2)
legend("topleft", legend = c("Data points", "Regression line"),
col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), bty =
"n")
3. Obtain the residual plot and comment on the
appropriateness of the model.
The residual plot shows no clear trend or pattern, with the residuals scattered randomly around
the zero line. This indicates that the model has likely captured all systematic relationships
between the independent variable (ACT Mathematics Score) and the dependent variable (final
exam score), supporting the appropriateness of a linear model. However, high dispersion of
residuals can suggest that the model's predictions are highly variable and uncertain.This
variability indicates that the model struggles to consistently predict values close to the actual
outcomes.This variability could be due to the presence of other influencing factors that the
model has not accounted for

residuals <- resid(model)


plot(data$x, residuals, main = "Residual Plot",
xlab = "ACT Mathematics Score", ylab = "Residuals",
pch = 19, col = "purple")
abline(h = 0, col = "black", lty = 2)
4. Find 95% confidence interval for β under the
usual assumptions. Comment in terms of the
problem.
1. Distribution of ^β 1

Under the classical linear regression assumptions:

• The errors are independent and normally distributed: ϵ i ∼ N ( 0 , σ 2 )

The sampling distribution of ^β 1 is:

β^ 1 − β 1
∼ t ( n −2 )

S / ∑ ( xi − x́ )
2

2. Hypothesis Testing Framework


Null Hypothesis ( H 0): β 1=0
Alternative Hypothesis ( H 1): β 1 ≠ 0
Test Statistic under H 0:

1
T= ∼ t ( n −2 )

S / ∑ ( xi − x́ )
2

3. Decision Rule Using Confidence Interval Approach


For a significance level α , we construct a ( 1 −α ) × 100 % confidence interval:

^β 1 ± t n −2 ,1 − α/2 × S
√ ∑ ( x − x́ )
i
2

Decision Rule:

• If 0 is NOT in the confidence interval → Reject H 0


• If 0 is in the confidence interval → Fail to reject H 0

In our case (α =0.05 ):

• 95% CI: [-0.025, 8.992]


• Since 0 ∈ [-0.025, 8.992], we cannot reject H 0
• it suggests that there is no strong evidence to conclude a significant relationship
between x and the dependent variable at the 95% confidence level.
# Confidence interval for beta_1
confint(model, level = 0.95)

2.5 % 97.5 %
(Intercept) -114.39837902 127.353828
x -0.02546064 8.992048

You might also like