0% found this document useful (0 votes)
15 views8 pages

Lab3 Report Revathy

The document describes creating linear regression models to predict mpg and horsepower for an automotive dataset. It fits a model with mpg as the response and horsepower as a predictor, as well as a multiple regression model with horsepower as the response and several other variables as predictors. Residual plots and other diagnostics are used to check that the assumptions of linear regression are met.

Uploaded by

Revathy P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Lab3 Report Revathy

The document describes creating linear regression models to predict mpg and horsepower for an automotive dataset. It fits a model with mpg as the response and horsepower as a predictor, as well as a multiple regression model with horsepower as the response and several other variables as predictors. Residual plots and other diagnostics are used to check that the assumptions of linear regression are met.

Uploaded by

Revathy P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

lab3_report_Revathy.

R
revak

2024-02-04
#Setup
# Set the CRAN mirror to Cloudflare
options(repos = structure(c(CRAN = "https://cloud.r-project.org/")))

#install(if not already installed) & load packages


library(ggplot2)
library(ISLR2)
data("Auto")
install.packages("car")

## Installing package into 'C:/Users/revak/AppData/Local/R/win-library/4.3'


## (as 'lib' is unspecified)

## package 'car' successfully unpacked and MD5 sums checked


##
## The downloaded binary packages are in
## C:\Users\revak\AppData\Local\Temp\RtmpCo4S7s\downloaded_packages

library(car)

## Loading required package: carData

#Lab Steps
# Step 1: Create a linear model with response = mpg, and explanatory variable
horsepower
# Code for creating the linear model
mpg_model <- lm(mpg ~ horsepower, data = Auto)

# Print summary for the linear model


summary(mpg_model)

##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16

# Step 2: Plot a scatter plot using the residuals and fitted values
# Code for plotting residuals vs fitted values
plot(mpg_model$fitted.values, mpg_model$residuals, main = "Residuals vs
Fitted",
xlab = "Fitted Values", ylab = "Residuals")

# Step 3: Does the scatter plot provide evidence that a linear regression
assumption is violated? Explain.
# Print summary results for reference
model_summary <- summary(mpg_model)
print(model_summary)

##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16

# Extract relevant values for commenting


residual_std_error <- model_summary$sigma
r_squared <- model_summary$r.squared

cat("\nResidual Standard Error:", residual_std_error, "\n")

##
## Residual Standard Error: 4.905757

cat("Multiple R-squared:", r_squared, "\n")

## Multiple R-squared: 0.6059483

##Answer: The absence of distinct patterns in the scatter plot, along with a
Residual Standard Error of 4.906 and a Multiple R-squared of 0.6059, suggests
that the model adheres to linear regression assumptions, showcasing
randomness, consistent spread of residuals, and linearity.

# Step 4: Plot a histogram and QQPlot for the residuals


# Code for plotting histogram of residuals
hist(mpg_model$residuals, main = "Histogram of Residuals", xlab =
"Residuals")
# Code for plotting QQ plot of residuals
qqPlot(mpg_model$residuals, main = "QQ Plot of Residuals")
## 323 330
## 321 328

# Step 5: Do you believe the residuals are approximately normally


distributed?
#Answer : Deviation at points 330 and 323 seen in the plot

# Step 6: Create a linear model with response = horsepower, and explanatory


variables weight, origin, cylinders, and acceleration
# Code for creating the linear model with multiple predictors
horsepower_model <- lm(horsepower ~ weight + origin + cylinders +
acceleration, data = Auto)

# Print summary for the linear model with multiple predictors


summary(horsepower_model)

##
## Call:
## lm(formula = horsepower ~ weight + origin + cylinders + acceleration,
## data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.683 -8.221 -1.129 5.292 83.857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.837488 7.203703 11.222 < 2e-16 ***
## weight 0.029489 0.001836 16.065 < 2e-16 ***
## origin 3.154415 1.038882 3.036 0.00256 **
## cylinders 2.375640 0.953508 2.491 0.01314 *
## acceleration -5.285656 0.284090 -18.606 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.27 on 387 degrees of freedom
## Multiple R-squared: 0.8824, Adjusted R-squared: 0.8812
## F-statistic: 725.9 on 4 and 387 DF, p-value: < 2.2e-16

# Steps 7: Repeat steps 2, 3, 4, 5 for this model


#Step2: Scatter plot of residuals vs fitted values
plot(horsepower_model$fitted.values, horsepower_model$residuals,
main = "Residuals vs Fitted (Horsepower Model)",
xlab = "Fitted Values", ylab = "Residuals")
#Step3: Print summary results for the new model
model_summary_new <- summary(horsepower_model)
print(model_summary_new)

##
## Call:
## lm(formula = horsepower ~ weight + origin + cylinders + acceleration,
## data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.683 -8.221 -1.129 5.292 83.857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.837488 7.203703 11.222 < 2e-16 ***
## weight 0.029489 0.001836 16.065 < 2e-16 ***
## origin 3.154415 1.038882 3.036 0.00256 **
## cylinders 2.375640 0.953508 2.491 0.01314 *
## acceleration -5.285656 0.284090 -18.606 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.27 on 387 degrees of freedom
## Multiple R-squared: 0.8824, Adjusted R-squared: 0.8812
## F-statistic: 725.9 on 4 and 387 DF, p-value: < 2.2e-16
# Extract relevant values for commenting
residual_std_error_new <- model_summary_new$sigma
r_squared_new <- model_summary_new$r.squared

cat("\nResidual Standard Error (New Model):", residual_std_error_new, "\n")

##
## Residual Standard Error (New Model): 13.26792

cat("Multiple R-squared (New Model):", r_squared_new, "\n")

## Multiple R-squared (New Model): 0.8823972

#Answer : The scatter plot, along with a Residual Standard Error of 13.26792
and a Multiple R-squared of 0.8823972 for the new model, suggests that linear
regression assumptions are upheld, showing randomness, consistent residual
spread, and robust linearity.

#Step4: Histogram of residuals


hist(horsepower_model$residuals, main = "Histogram of Residuals (Horsepower
Model)", xlab = "Residuals")

# QQ plot of residuals
qqPlot(horsepower_model$residuals, main = "QQ Plot of Residuals (Horsepower
Model)")
## 14 117
## 14 116

#Step5 - Analysis of plot for deviations - Point 14 and 117 show deviations

You might also like