0% found this document useful (0 votes)
5 views9 pages

Topic

The document outlines a process for predicting car prices using linear regression, starting with a summary of the dataset and its variables. It includes data visualization steps, such as scatter plots of horsepower and RPM against price, and the creation of a linear model using selected predictors. The output of the linear model indicates the coefficients, significance levels, and overall model fit statistics.

Uploaded by

Vedant Gade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Topic

The document outlines a process for predicting car prices using linear regression, starting with a summary of the dataset and its variables. It includes data visualization steps, such as scatter plots of horsepower and RPM against price, and the creation of a linear model using selected predictors. The output of the linear model indicates the coefficients, significance levels, and overall model fit statistics.

Uploaded by

Vedant Gade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Topic:- Car Prices prediction using Linear Regression

Analysis

Code

Step1:- Summary of database


>summary(raw_data2)
car_ID symboling CarName fueltype aspiration doornumber carbody
Min. : 1 Min. :-2.0000 Length:205 Length:205 Length:205 Length:205
Length:205
1st Qu.: 52 1st Qu.: 0.0000 Class :character Class :character Class :character Class :character
Class :character
Median :103 Median : 1.0000 Mode :character Mode :character Mode :character
Mode :character Mode :character
Mean :103 Mean : 0.8341
3rd Qu.:154 3rd Qu.: 2.0000
Max. :205 Max. : 3.0000
drivewheel enginelocation wheelbase carlength carwidth carheight
curbweight
Length:205 Length:205 Min. : 86.60 Min. :141.1 Min. :60.30 Min. :47.80 Min. :1488
Class :character Class :character 1st Qu.: 94.50 1st Qu.:166.3 1st Qu.:64.10 1st Qu.:52.00 1st
Qu.:2145
Mode :character Mode :character Median : 97.00 Median :173.2 Median :65.50 Median :54.10
Median :2414
Mean : 98.76 Mean :174.0 Mean :65.91 Mean :53.72 Mean :2556
3rd Qu.:102.40 3rd Qu.:183.1 3rd Qu.:66.90 3rd Qu.:55.50 3rd Qu.:2935
Max. :120.90 Max. :208.1 Max. :72.30 Max. :59.80 Max. :4066
enginetype cylindernumber enginesize fuelsystem boreratio stroke
compressionratio
Length:205 Length:205 Min. : 61.0 Length:205 Min. :2.54 Min. :2.070 Min. :
7.00
Class :character Class :character 1st Qu.: 97.0 Class :character 1st Qu.:3.15 1st Qu.:3.110 1st
Qu.: 8.60
Mode :character Mode :character Median :120.0 Mode :character Median :3.31 Median :3.290
Median : 9.00
Mean :126.9 Mean :3.33 Mean :3.255 Mean :10.14
3rd Qu.:141.0 3rd Qu.:3.58 3rd Qu.:3.410 3rd Qu.: 9.40
Max. :326.0 Max. :3.94 Max. :4.170 Max. :23.00
horsepower peakrpm citympg highwaympg price
Min. : 48.0 Min. :4150 Min. :13.00 Min. :16.00 Min. : 5118
1st Qu.: 70.0 1st Qu.:4800 1st Qu.:19.00 1st Qu.:25.00 1st Qu.: 7788
Median : 95.0 Median :5200 Median :24.00 Median :30.00 Median :10295
Mean :104.1 Mean :5125 Mean :25.22 Mean :30.75 Mean :13277
3rd Qu.:116.0 3rd Qu.:5500 3rd Qu.:30.00 3rd Qu.:34.00 3rd Qu.:16503
Max. :288.0 Max. :6600 Max. :49.00 Max. :54.00 Max. :45400

>str(data)
tibble [10,000 × 8] (S3: tbl_df/tbl/data.frame)
$ web-scraper-order : chr [1:10000] "1680204632-1" "1680204632-2" "1680204632-3"
"1680204632-4" ...
$ Car Model : chr [1:10000] "Skoda Octavia A8 2022" "Skoda Octavia A8 2022"
"Skoda Octavia A8 2022" "Skoda Octavia A8 2022" ...
$ Month/Year : chr [1:10000] "2023-03" "2023-02" "2023-01" "2022-12" ...
$ Average_price : chr [1:10000] "967,000 EGP" "979,000 EGP" "917,000 EGP" "881,000
EGP" ...
$ Minimum_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Maximum_Price : chr [1:10000] "1,017,000 EGP" "1,045,000 EGP" "950,000 EGP"
"950,000 EGP" ...
$ Average_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Average_price_Cleaned: num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
> str(data1)
'data.frame': 205 obs. of 26 variables:
$ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
$ CarName : chr "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio"
"audi 100 ls" ...
$ fueltype : chr "gas" "gas" "gas" "gas" ...
$ aspiration : chr "std" "std" "std" "std" ...
$ doornumber : chr "two" "two" "two" "four" ...
$ carbody : chr "convertible" "convertible" "hatchback" "sedan" ...
$ drivewheel : chr "rwd" "rwd" "rwd" "fwd" ...
$ enginelocation : chr "front" "front" "front" "front" ...
$ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
$ carlength : num 169 169 171 177 177 ...
$ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
$ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
$ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
$ enginetype : chr "dohc" "dohc" "ohcv" "ohc" ...
$ cylindernumber : chr "four" "four" "six" "four" ...
$ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
$ fuelsystem : chr "mpfi" "mpfi" "mpfi" "mpfi" ...
$ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
$ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
$ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
$ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
$ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
$ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
$ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
$ price : num 13495 16500 16500 13950 17450 ...
Step2: Data Visualization (point plot of RPM vs car price):

ggplot(data, aes(x = horsepower, y = price)) +


+ geom_point() +
+ labs(title = "Scatter Plot of horsepower Vs price")
> ggplot(data, aes(x = peakrpm, y = price)) +
+ geom_point() +
+ labs(title = "Scatter Plot of RPM Vs price")
Step3:- Making the linear model:-
Syntax

lm_model <- lm(price ~ highwaympg + symboling + stroke, data = data1)


> summary(lm_model)

Output:-

Call:
lm(formula = price ~ highwaympg + symboling + stroke, data = data1)

Residuals:
Min 1Q Median 3Q Max
-8208 -3513 -1490 1461 20534

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34294.2 4637.8 7.395 3.76e-12 ***
highwaympg -804.6 58.4 -13.776 < 2e-16 ***
symboling -356.4 322.7 -1.104 0.271
stroke 1235.3 1281.8 0.964 0.336
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5736 on 201 degrees of freedom


Multiple R-squared: 0.4921, Adjusted R-squared: 0.4845
F-statistic: 64.92 on 3 and 201 DF, p-value: < 2.2e-16
Step4:- Visualizing Linear Model:-
Step5:- Plotting the Predicted Values:-

You might also like