0% found this document useful (0 votes)

183 views4 pages

Car Price Prediction Report

The document summarizes a project that uses multiple linear regression with gradient descent to predict car prices based on features in a dataset. It describes the process of collecting and preprocessing the data, initializing parameters, performing gradient descent to minimize cost, and calculating error metrics. Code is provided to train the model on training and cross-validation sets and predict prices on new test data by normalizing features and calculating the dot product of parameters and features. The model is able to accurately predict car prices based on selected features in the dataset.

Uploaded by

Ruqaiya Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views4 pages

Car Price Prediction Report

Uploaded by

Ruqaiya Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

National University of Sciences and Technology

School of Mechanical and Manufacturing Engineering

Robotics and Intelligent Machines Engineering
Project Report: Car Price Prediction

Intent:
Train a Machine Learning algorithm and predict Car price based on the selected features.

Method:
The Machine Learning method used for this Project, Car Price Prediction, is Multiple-Linear
Regression with Gradient Descent.

Process:
1. Packages:
• Numpy
• Pandas
• [Link]
2. Data:
Dataset collected from PAKWHEELS processed and saved as CSV (Comma Delimited) format.
To access the data ‘Pandas’ library function READ_CSV() is used. The data is then separated
as Features and Label, as ‘x_train’ and ‘y_train’, respectively.
The data() function returns ‘x_train’ and ‘y_train’ to the main() function for further use.
3. Normalization
The feature set is then normalized using the formula
X = (X – MEAN) / (STANDARD DEVIATION)
The normal() function returns ‘x_train’, ‘data_mean’ and ‘data_std’: which are the
normalized features, mean of the features and standard deviation of the features,
respectively, to the main() function. The ‘data_mean’ and ‘data_std’ will be used in
normalizing the TEST INPUT, in the [Link], to predict the car price.
4. Complete Data
The feature set is completed for processing by adding the ‘bias’ unit for all the rows, in the
data_complete() function, using the [Link]() function.
5. Parameter Initialization
The parameters ‘theta’ is randomly initialized using the [Link](), and is
returned to the main() function.
6. Gradient Descent
The gradient() function has been used to minimize the parameter theta and reduce the cost
iteratively, on each iteration theta values are updated which generates a cost history. The
last training cost is then multiplied by million to have our final cost for training set.
7. Mean Absolute error
The mae() function calculates the absolute mean of the error in the test set, training set and
the CV set of the data by using formula which is called in the main function using print() and
displayed for the user.
Flow of Algorithm:

Train Model:
2. Calculate
Retrieve Data 1. Generate Random
Hypothesis
Theta Values

Choose a Learning 3. Calculate initial Cost

Data Cleansing
Algorithm Function

4. Gradient Descent:
Define Features and Update values of theta
Append bias units
Label iteratively to generate
cost history

Separate the data into:

1. Training set Normalize the Data Predict Car Prices
2. Cross validation set
3. Test set

Code Running Instructions:

The code has been split into two parts; Training and Prediction.

Training:
In training part, the data has been read from a csv file, separated into training feature set and
training label. For the data set provided the dimensions of the feature set and the label set are

(16281, 9) (16281,) respectively.

Then the data has been categorized in another pair of sets named cross validation feature set and
cross validation label set. The calculated value for the dimensions of the feature and label set are
(5425, 9) (5425,) respectively.

After that the last pair of needed sets, the test feature set and the test label set has been created
similarly. The dimensions for these pair of sets are (5428, 9) (5428,).

These columns are then completed by adding the bias units or appending the columns with ones
hence the resultant values for dimensions become

(16281, 10) (16281, 1) (5425, 10) (5425, 1) (5428, 10) (5428, 1) (16281, 10) (16281, 1) (5425, 10)
(5425, 1) (5428, 10) (5428, 1).
Random theta values are initialized

Value set of theta is [[0.49354865]

[0.62653841]

[0.11832303]

[0.0742843 ]

[0.42119429]

[0.39886133]

[0.27029176]

[0.76941718]

[0.92276763]

[0.51739262]].

After completing the sets, the cost formula is used to calculate the cost function before and after the
data is trained and for the cross-validation data set. These values are

Cost before training: 1.4038338473544203

Final Cost for Train Set: 0.28237719451856336

Final Cost for CV Set: 0.2663015740160541

The graph of final cost for training set and cross-validation set is plotted as:

Figure 1: Final cost for training set and cross-validation set

The next task is to calculate the absolute mean error on the three sets that we earlier created.

This is done by using the formula of the absolute mean error which is error = [Link] (abs (h-y) / m,
where all the used variables have already been defined in the code.

The results for these errors are

Mean Absolute Error for Training Set: 0.38244467157992945

Mean Absolute Error for CV Set: 0.3486779215566904

Mean Absolute Error for Test Set: 0.3678959796760039.

Prediction:
The training2 file of code is imported to this prediction file, to be able to use the calculated values
for all the sets created.

In the prediction() function an array for all the nine features is created, the data is normalized by
using the mean and standard deviation functions created in the training2 code of file.

This file is appended with ones to be able to calculate the dot product of the test data with the final
value of theta.

Value obtained in the previous step is multiplied with one million to get an appropriate price for the
car whose price needs to be predicted.

An example prediction is attached in the following snapshot

Conclusion:
The algorithm used in this program predicts the car prices by dividing the provided sets into multiple
sets of data. Calculations are performed, and data is normalized to generate efficient prediction
results.

SpyderGuard
No ratings yet
SpyderGuard
1 page
String Functions
No ratings yet
String Functions
9 pages
Arbol
No ratings yet
Arbol
1,790 pages
Outsourcing to Hire a Virtual Assistant
100% (1)
Outsourcing to Hire a Virtual Assistant
33 pages
RATIONAL KFC Training Document 2017 V1
No ratings yet
RATIONAL KFC Training Document 2017 V1
30 pages
Beserra, Nussbaum, & Oteo (2019) On-Task and Off-Task Behavior - . - Mathematics Learning With Educational Video Games
No ratings yet
Beserra, Nussbaum, & Oteo (2019) On-Task and Off-Task Behavior - . - Mathematics Learning With Educational Video Games
23 pages
EAM Curriculum: Intro to S/4HANA
No ratings yet
EAM Curriculum: Intro to S/4HANA
35 pages
Testbank For Voyages in World History 4th Edition Hansen Instant Download
0% (1)
Testbank For Voyages in World History 4th Edition Hansen Instant Download
18 pages
Security Assignment
100% (1)
Security Assignment
56 pages
IT Branch Attendance
No ratings yet
IT Branch Attendance
2 pages
How To Enter Warcraft III: The Frozen Throne Cheat Codes
No ratings yet
How To Enter Warcraft III: The Frozen Throne Cheat Codes
2 pages
CUESTIONARIO#5
No ratings yet
CUESTIONARIO#5
59 pages
Glossika How To Read These 60 Programming Terms in English
No ratings yet
Glossika How To Read These 60 Programming Terms in English
15 pages
IMS-module 3
No ratings yet
IMS-module 3
7 pages
Limits and Continuity
No ratings yet
Limits and Continuity
27 pages
A Single-Chip Pulsoximeter Design Using The MSP430: Application Report
No ratings yet
A Single-Chip Pulsoximeter Design Using The MSP430: Application Report
11 pages
6 TH
No ratings yet
6 TH
16 pages
Data Structures
No ratings yet
Data Structures
7 pages
Design and Analysis of Production Systems (OTM-453) : by DR Muhammad Moazzam
No ratings yet
Design and Analysis of Production Systems (OTM-453) : by DR Muhammad Moazzam
9 pages
CHAPTER 6
No ratings yet
CHAPTER 6
100 pages
The Role of Color in Malaysian Culture
57% (7)
The Role of Color in Malaysian Culture
5 pages
Kahoot
No ratings yet
Kahoot
16 pages
Electrical Wiring Plan
88% (34)
Electrical Wiring Plan
30 pages
Info Iso1511qsqsqs8-1 (Ed1.0) en
No ratings yet
Info Iso1511qsqsqs8-1 (Ed1.0) en
6 pages
Prewitt Edge Detection in MATLAB
No ratings yet
Prewitt Edge Detection in MATLAB
6 pages
Compact ECG-2150: Accurate Diagnosis
No ratings yet
Compact ECG-2150: Accurate Diagnosis
4 pages
4KA/5V Primary Current Injection Set
No ratings yet
4KA/5V Primary Current Injection Set
3 pages
Analizer Cortex Metasoft
No ratings yet
Analizer Cortex Metasoft
16 pages
Nexus DR Software v3.2.0.180 Notes
No ratings yet
Nexus DR Software v3.2.0.180 Notes
7 pages
HTML Basics
No ratings yet
HTML Basics
101 pages

Car Price Prediction Report

Uploaded by

Car Price Prediction Report

Uploaded by

National University of Sciences and Technology

School of Mechanical and Manufacturing Engineering

Choose a Learning 3. Calculate initial Cost

Separate the data into:

Code Running Instructions:

(16281, 9) (16281,) respectively.

Value set of theta is [[0.49354865]

Cost before training: 1.4038338473544203

Final Cost for Train Set: 0.28237719451856336

Final Cost for CV Set: 0.2663015740160541

Figure 1: Final cost for training set and cross-validation set

The results for these errors are

Mean Absolute Error for Training Set: 0.38244467157992945

Mean Absolute Error for CV Set: 0.3486779215566904

Mean Absolute Error for Test Set: 0.3678959796760039.

An example prediction is attached in the following snapshot

You might also like