0% found this document useful (0 votes)
30 views10 pages

PRJ Car Price Prediction For Data Science

The document describes building several machine learning models to predict car prices using a dataset of Audi cars. It performs exploratory data analysis on the dataset, including feature engineering like encoding and scaling. Several regression models are trained and compared, including random forest, linear regression, extra trees, and CatBoost regressors. The top-performing CatBoost model is saved using pickle for future use in predicting car prices.

Uploaded by

shivaybhargava33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views10 pages

PRJ Car Price Prediction For Data Science

The document describes building several machine learning models to predict car prices using a dataset of Audi cars. It performs exploratory data analysis on the dataset, including feature engineering like encoding and scaling. Several regression models are trained and compared, including random forest, linear regression, extra trees, and CatBoost regressors. The top-performing CatBoost model is saved using pickle for future use in predicting car prices.

Uploaded by

shivaybhargava33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

CAR PRICE PREDICTION

pip install pandas-profiling

Data Set: audi.csv


Dependent variable: price

Import Library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

Check Current Directory


os.getcwd()

Change the directory


os.chdir ('C:\\Noble\\Training\\Acmegrade\\Data Science\\Projects\\PRJ Car
Price Prediction\\')
os.getcwd()
Read Data, display records
df=pd.read_csv("audi.csv")
display(df)

Automated Exploratory Data Analysis (EDA)

Pandas Profiling Report


import pandas_profiling as pf
display(pf.ProfileReport(df))

Manual EDA
Number of records
len(df)

Number of records- Shape


display (df.shape)

Checking the data types


display (df.dtypes )

Checking null values


display (df.isna().sum() )

Data set details – Info


df.info()

Data set details – Describe


df.describe ()

Create X
X = df.iloc[:,[0,1,3,4,5,6,7,8]].values
display (X)

Create Y
Y = df.iloc[:,[2]].values
display (Y)

Display Top 5 - X variable


display(pd.DataFrame(X).head(5))

Label Encoding
from sklearn.preprocessing import LabelEncoder
le1 = LabelEncoder()
X[:,0] = le1.fit_transform(X[:,0])
le2 = LabelEncoder()
X[:,-4] = le2.fit_transform(X[:,-4])
display (X)

One hot Encoding to column – transmission


from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(transformers = [('encoder',OneHotEncoder(),
[2])],remainder='passthrough')
X = ct.fit_transform(X)
display (pd.DataFrame(X))

Display – X
display (pd.DataFrame(X))

Features Scaling – Standardization

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X = sc.fit_transform(X)
display (pd.DataFrame(X))
Train Test Split
from sklearn.model_selection import train_test_split
(X_train,X_test,Y_train,Y_test) =
train_test_split(X,Y,test_size=0.2,random_state=0)

Create Random Forest Regressor

from sklearn.ensemble import RandomForestRegressor


regression = RandomForestRegressor(random_state=0)
regression.fit(X_train,Y_train)

Prediction with Test Data


y_pred = regression.predict(X_test)
display (y_pred)

Display actual and Predicted Values

print(np.concatenate((y_pred.reshape(len(y_pred),1),Y_test.reshape(len(Y_tes
t),1)),1))

Display – Accuracy and Mean Absolute Error

from sklearn.metrics import r2_score,mean_absolute_error


print ('R2 Score ', r2_score(Y_test, y_pred))
print ('Mean Absolute Error', mean_absolute_error(Y_test,y_pred))

Create a Linear Regression Model

from sklearn.linear_model import LinearRegression


reg = LinearRegression()
reg.fit(X_train,Y_train)

Prediction with Test Data


y_pred = reg.predict(X_test)
display (y_pred)

Display actual and Predicted Values

print(np.concatenate((y_pred.reshape(len(y_pred),1),Y_test.reshape(len(Y_tes
t),1)),1))

Display – Accuracy and Mean Absolute Error

from sklearn.metrics import r2_score,mean_absolute_error


print ('R2 Score ', r2_score(Y_test, y_pred))
print ('Mean Absolute Error', mean_absolute_error(Y_test,y_pred))

Prediction for complete data set


y_pred = reg.predict(X)
display (y_pred)
Display the Actual and predicted data

result = pd.concat([df,pd.DataFrame(y_pred)],axis=1)
display( result)

Create Model Extra Tree Regressor


from sklearn.ensemble import ExtraTreesRegressor
ET_Model=ExtraTreesRegressor(n_estimators = 120)
ET_Model.fit(X_train,Y_train)
y_predict=ET_Model.predict(X_test)
from sklearn.metrics import r2_score,mean_absolute_error
print ('R2 Score ', r2_score(Y_test, y_predict))
print ('Mean Absolute Error', mean_absolute_error(Y_test,y_predict))

Display the Result


y_pred = reg.predict(X)
display (y_pred)
result = pd.concat([df,pd.DataFrame(y_pred)],axis=1)
display( result)

RandomizedSearchCV
# Hyperparameter Tuning and RandomizedSearchCV - Model used –
RandomForestRegressor

from sklearn.model_selection import RandomizedSearchCV


n_estimators = [int(x) for x in np.linspace(start = 80, stop = 1500, num = 10)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(6, 45, num = 5)]
min_samples_split = [2, 5, 10, 15, 100]
min_samples_leaf = [1, 2, 5, 10]

# create random grid

rand_grid={'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf}

rf=RandomForestRegressor()

rCV=RandomizedSearchCV(estimator=rf,param_distributions=rand_grid,scorin
g='neg_mean_squared_error',n_iter=3,cv=3,random_state=42, n_jobs = 1)

Fit Model
import warnings
warnings.filterwarnings('ignore')

rCV.fit(X_train,Y_train)

Prediction
rf_pred=rCV.predict(X_test)
display (rf_pred)

Mean_absolute_error and mean_squared_error

from sklearn.metrics import mean_absolute_error,mean_squared_error


print('MAE',mean_absolute_error(Y_test,rf_pred))
print('MSE',mean_squared_error(Y_test,rf_pred))

Display Accuracy
display (r2_score(Y_test,rf_pred))

Install Cat boost


pip install catboost

Model CatBoostRegressor
from catboost import CatBoostRegressor
cat=CatBoostRegressor()
cat.fit(X_train,Y_train)

Cat Boost Prediction


cat_pred=cat.predict(X_test)
display (cat_pred)

Cat Boost Accuracy


display (r2_score(Y_test,cat_pred))

Create Pickle File


#Use pickle to save our model so that we can use it later
import pickle
# Saving model to disk
pickle.dump(cat, open('model.pkl','wb'))

Load Pickle File and do Prediction


model=pickle.load(open('model.pkl','rb'))
model.predict (X_train)

You might also like