Assignment Auto
Assignment Auto
R - Colaboratory
keyboard_arrow_down Assignment
library(dplyr)
library(stringr)
df <- read.csv('/content/Auto.csv')
str(df)
head(df)
A data.frame: 6 × 9
mpg cylinders displacement horsepower weight acceleration year origin name
summary(df)
output Min.
mpg
: 9.00
cylinders
Min. :3.000
displacement
Min. : 68.0
horsepower
Length:397
1st Qu.:17.50 1st Qu.:4.000 1st Qu.:104.0 Class :character
Median :23.00 Median :4.000 Median :146.0 Mode :character
Mean :23.52 Mean :5.458 Mean :193.5
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:262.0
Max. :46.60 Max. :8.000 Max. :455.0
weight acceleration year origin
Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000
1st Qu.:2223 1st Qu.:13.80 1st Qu.:73.00 1st Qu.:1.000
Median :2800 Median :15.50 Median :76.00 Median :1.000
Mean :2970 Mean :15.56 Mean :75.99 Mean :1.574
3rd Qu.:3609 3rd Qu.:17.10 3rd Qu.:79.00 3rd Qu.:2.000
Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
name
Length:397
Class :character
Mode :character
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 1/6
1/7/24, 2:31 AM Assignment_Auto.R - Colaboratory
sum(is.na(df$horsepower))
head(df)
A data.frame: 6 × 9
mpg cylinders displacement horsepower weight acceleration year origin
c
1 18 8 307 130 3504 12.0 70 1 c
p
3 18 8 318 150 3436 11.0 70 1
keyboard_arrow_down Qualitative/Quantitative
str(df)
Result
Quantitative are mpg, cylinders, displacement, horsepower, weight, year, origin, acceleration.
Qualitative is name
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 2/6
1/7/24, 2:31 AM Assignment_Auto.R - Colaboratory
numeric_columns <- sapply(df, is.numeric)
ranges <- sapply(df[, numeric_columns], function(x) range(x, na.rm = TRUE))
print(ranges)
print(summary_mean_std)
Mean SD
mpg 23.515869 7.8258039
cylinders 5.458438 1.7015770
displacement 193.532746 104.3795833
horsepower 104.331234 38.2669944
weight 2970.261965 847.9041195
acceleration 15.555668 2.7499953
year 75.994962 3.6900049
origin 1.574307 0.8025495
Now remove the 10th through 85th observations. What is the range, mean, and
keyboard_arrow_down standard deviation of each predictor in the subset of the data that remains?
subset_df <- df[-c(10:85), numeric_columns]
summary_subset <- sapply(subset_df, function(x) list(Min = min(x), Max = max(x),Mean = mean(x), Sd = sd(x)))
print(summary_subset)
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 3/6
1/7/24, 2:31 AM Assignment_Auto.R - Colaboratory
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 4/6
1/7/24, 2:31 AM Assignment_Auto.R - Colaboratory
# Correlation Matrix
options(width = 80)
round(cor(df[,1:8]), 2)
Observations:
Based on the scatterplots, it appears that displacement, horsepower, and weight have a negative relationship with MPG.
Acceleration and year have a positive relationship with MPG which means cars became more efficient with time due to advancement in
technology
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 5/6
1/7/24, 2:31 AM Assignment_Auto.R - Colaboratory
In correlation matrix, there is a positive relationship between horsepower and weight.This makes sense as generally, a heavier car requires
more horsepower to attain same level of performance as a lighter car.Therefore, horsepower of a car is relative to its weight
In correlation matrix, there is a positive relationship between horsepower and displacement as more displacement will produce more
power
Suppose that we wish to predict gas mileage (mpg) on the basis of the other
keyboard_arrow_down variables. Do your plots suggest that any of the other variables might be useful in
predicting mpg? Justify your answer.
If the relationship between the variables is strong and if there is a clear pattern or trend in the data then it is likely that those variables will
be useful in predicting the target variable . We may conclude that "displacement", "weight", "year", and "origin" have a statistically
significant relationship while "cylinders", "horsepower" and "acceleration" do not.
On the other hand, if there is no clear relationship between the variables, then it is less likely that those variables will be useful for
prediction.
Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.
https://colab.research.google.com/drive/1-js3QyiIxeYYLpklM33bOqQBaHjp6w8O#scrollTo=QZNq9x2X3T_o&printMode=true 6/6