Topic
Topic
Analysis
Code
>str(data)
tibble [10,000 × 8] (S3: tbl_df/tbl/data.frame)
$ web-scraper-order : chr [1:10000] "1680204632-1" "1680204632-2" "1680204632-3"
"1680204632-4" ...
$ Car Model : chr [1:10000] "Skoda Octavia A8 2022" "Skoda Octavia A8 2022"
"Skoda Octavia A8 2022" "Skoda Octavia A8 2022" ...
$ Month/Year : chr [1:10000] "2023-03" "2023-02" "2023-01" "2022-12" ...
$ Average_price : chr [1:10000] "967,000 EGP" "979,000 EGP" "917,000 EGP" "881,000
EGP" ...
$ Minimum_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Maximum_Price : chr [1:10000] "1,017,000 EGP" "1,045,000 EGP" "950,000 EGP"
"950,000 EGP" ...
$ Average_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Average_price_Cleaned: num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
> str(data1)
'data.frame': 205 obs. of 26 variables:
$ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
$ CarName : chr "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio"
"audi 100 ls" ...
$ fueltype : chr "gas" "gas" "gas" "gas" ...
$ aspiration : chr "std" "std" "std" "std" ...
$ doornumber : chr "two" "two" "two" "four" ...
$ carbody : chr "convertible" "convertible" "hatchback" "sedan" ...
$ drivewheel : chr "rwd" "rwd" "rwd" "fwd" ...
$ enginelocation : chr "front" "front" "front" "front" ...
$ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
$ carlength : num 169 169 171 177 177 ...
$ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
$ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
$ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
$ enginetype : chr "dohc" "dohc" "ohcv" "ohc" ...
$ cylindernumber : chr "four" "four" "six" "four" ...
$ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
$ fuelsystem : chr "mpfi" "mpfi" "mpfi" "mpfi" ...
$ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
$ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
$ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
$ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
$ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
$ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
$ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
$ price : num 13495 16500 16500 13950 17450 ...
Step2: Data Visualization (point plot of RPM vs car price):
Output:-
Call:
lm(formula = price ~ highwaympg + symboling + stroke, data = data1)
Residuals:
Min 1Q Median 3Q Max
-8208 -3513 -1490 1461 20534
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34294.2 4637.8 7.395 3.76e-12 ***
highwaympg -804.6 58.4 -13.776 < 2e-16 ***
symboling -356.4 322.7 -1.104 0.271
stroke 1235.3 1281.8 0.964 0.336
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1