0% found this document useful (0 votes)
4 views48 pages

Final DAProject

Uploaded by

ckdphong232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views48 pages

Final DAProject

Uploaded by

ckdphong232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

INTERNATIONAL UNIVERSITY

Project
Toyota Carolla

Châu Khắc Đình Phong – ITDSIU20076


Nguyễn Ảnh Đô – ITDSIU20059
1.
CONTENTS Introduct
ion

2. Tool
Used
In this topic, I am going
INT
to use a data set called RO
Toyota Corolla which DU
contains information on CT
IO
the sale of used car in N
the Asia.

This data set has information on a variety of characteristics of cars


such as the price those cars were sold for and these prices are in Euros,
then it has information on the model of the car each of the car
manufacturing monthly, year are fuel type, model, price, age, and
color, automatic and the kilometers who used to use car.
First, we will show in this report how we
clean and detect fraud in our dataset.

Secondly, we statistic each value compared


to every single model car in dataset.

Thirdly, we analize by checking the


correlation of attributes, then plot the
distribution, caculates variance, mode,
mean for each attribute in dataset.

Fourly, Predicting the price base on the


actual price in dataset with many different
building models.

Finally, compare and evalute for each


value in building models.
R is a programming
language and free software
developed by Ross Ihaka
and Robert Gentleman in
1993. R possesses an

TOOL
extensive catalog of
statistical and graphical
methods. It includes
machine learning

USED
algorithms, linear
regression, time series,
statistical inference to
name a few.

Excel is a spreadsheet program from


Microsoft and a component of its
Office product group for business
applications. Microsoft Excel enables
users to format, organize and
calculate data in a spreadsheet.
PROJECT
1.Visualize, Clean and Transform Data / Excel
-There are more than 60 cars in

the price range from 7800$-7995$

and 10900$-11290$

-More than 100 cars in the price

range from 8800$ – 8995$

-More than 80 cars in the price

range from 9900$

- At the remaining prices, about

30 cars dropped back. In it, there

is no car in the range of 32000.


The chart shows the number of cars in the age range from 1-80 years
 The chart show sum of Kilometers for each Toyota car type.

 According to the chart:

- The car with the most KM is TOYOTA Corolla 1.6 16V HATCHB LINEA TERRA 2/3-Doors:
7704759 (km)
- The car with the least KM (only 1km) is:
+ TOYOTA Corolla 1.6-16v VVT-i Linea Terra Comfort AIRCO NIEUW 5DRS 4/5-Doors.
+ TOYOTA Corolla VERSO 2.0 D4D SOL (7) BNS MPV.
+ TOYOTA Corolla 1.4-16v VVT-i Linea Terra Comfort NIEUW AIRCO 4/5-Doors.

+ TOYOTA Corolla 1.6-16v VVT-i Linea Terra Comfort NIEUW AIRCO 5drs 4/5-Doors.

+ TOYOTA Corolla 2.0 d HB Diesel 2/3-Doors.

+ TOYOTA Corolla 1.4-16v VVT-i Linea Terra Comfort NIEUW AIRCO 4/5-Doors.
 According to the chart, more than 1264
cars use Petrol fuel.
 Over 155 cars use Diesel fuel.

 Only 17 cars use CNG.


 According to the chart, Grey is the most popular color
with more than 300 cars use.
 The next is Red and Blue with more than 280 cars use.

 Beige, Vioet, Yellow is the least popular color.


 According to the chart we can see
most of car in Toyota use Automatic: 0

 Less than 200 cars use Automatic: 1


Checking for missing values by using R

Transform the value to


ordinal by using Excel
Processed data for analysis
2. Analysis the
data

Use R to plot the


distribution for
each attribute


Predict Price with 3 Model:
Regression Analysis, Random Forest, Gradient Boosting
Before using R to
predict, we split the data.
Result

CASE 1
Regression Analysis
Plot Regression
Analysis Model
Then we use this model which were created to predict price in data set

We have RMSE: 1567.824 after we use the Regression Analysis Model


Plot predicted
price compared
to actual price
Result

CASE 2
Random Forest
Importance feature

Random forest data


We continue to use this model to predict price

We have RMSE: 2101.017 after we use the Random Forest Model


Plot predicted
price compared
to actual price
CASE 3
Gradiant Boosting
Result
Find the
important
variable
We use model to predict price of car

We have RMSE: 1329.32 after we use the Gradiant Boosting Model


Plot predicted
price compared
to actual price
Conclusion
According to 3 RMSE

 Gradiant Boosting: RMSE_gbm: 1329.32

 Linear Regression: RMSE_lr: 1567.824

 Random Forest: RMSE_rf: 2101.017


is better than others to predict the price
because the lower the RMSE, the better a
Gradiant given model can “fit” a dataset. The
variable is useful to predict price is
Boosting Age_08_04, Mfg_year, KM. The accuracy
Model of model is predicting with RMSE which
has value is 1341.965. To improve RMSE,
the model can be further tuned.

Analyze whether the collected attributes have a positive


effect on the target variable or have a negative effect.
According to Linear Regression we calculated
- Based on the calculated data table, we can see the attribute have the positive effects are
Fuel_Type_Diesel, Fuel_Type_Petrol and Automatic. As we would anticipate, a car's price
will increase if it includes an automatic transmission, more specifically, the use of Diesel will
make the price of the car higher than using Petrol.

- Next, we can see that Age and KM have negative values, proving that it negatively affects the
price of a car. That’s mean if the car is older, it has a negative impact on the price, therefore
the older the car, the lower the price.

- We can see the column p-value (Pr(>|t|)), they are 4 column has a p value less than 0.05. We
can confirm that these properties are statistically significant. In this case, we find Age_08_04,
KM, Automatic statistically significant.
Reference

https://www.toyota.com/corolla/

https://en.wikipedia.org/wiki/Toyota_Corolla
THANK YOU FOR FOLLOWING

You might also like