DIAMOND PRICE PREDICTIONS - Ipynb - Colaboratory
DIAMOND PRICE PREDICTIONS - Ipynb - Colaboratory
ipynb - Colaboratory
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 1/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
dia=pd.read_csv("diamonds.csv")
price price in US dollars (326 − −18,823)This is the target column containing tags for the features.
The 4 Cs of Diamonds:-
carat (0.2--5.01) The carat is the diamond’s physical weight measured in metric carats. One carat
equals 1/5 gram and is subdivided into 100 points. Carat weight is the most objective grade of the
4Cs.
cut (Fair, Good, Very Good, Premium, Ideal) In determining the quality of the cut, the diamond grader
evaluates the cutter’s skill in the fashioning of the diamond. The more precise the diamond is cut,
the more captivating the diamond is to the eye.
color, from J (worst) to D (best) The colour of gem-quality diamonds occurs in many hues. In the
range from colourless to light yellow or light brown. Colourless diamonds are the rarest. Other
natural colours (blue, red, pink for example) are known as "fancy,” and their colour grading is
different than from white colorless diamonds.
clarity (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) Diamonds can have internal
characteristics known as inclusions or external characteristics known as blemishes. Diamonds
without inclusions or blemishes are rare; however, most characteristics can only be seen with
magnification.
Dimensions
x length in mm (0--10.74)
y width in mm (0--58.9)
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 2/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
z depth in mm (0--31.8)
dia.head()
dia.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 53940 non-null int64
1 carat 53940 non-null float64
2 cut 53940 non-null object
3 color 53940 non-null object
4 clarity 53940 non-null object
5 depth 53940 non-null float64
6 table 53940 non-null float64
7 price 53940 non-null int64
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 3/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
8 x 53940 non-null float64
9 y 53940 non-null float64
10 z 53940 non-null float64
dtypes: float64(6), int64(2), object(3)
memory usage: 4.5+ MB
dia.describe()
dia.isna().sum()
Unnamed: 0 0
carat 0
cut 0
color 0
clarity 0
depth 0
table 0
price 0
x 0
y 0
z 0
dtype: int64
dia=dia.drop(['Unnamed: 0'],axis=1)
dia.head()
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 4/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
dia.shape
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.2310)
(53940, Good E VS1 56.9 65.0 327 4.05 4.07 2.31
Data4 visualization
0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
dia.head()
sns.countplot(data=dia,x='cut')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 5/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
sns.countplot(data=dia,x='color')
sns.boxplot(data=dia[['x','y','z']])
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 6/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
<Axes: >
dia['cut'].value_counts().plot(kind='pie',autopct='%2.1f%%',legend=True,startangle=120)
<Axes: ylabel='cut'>
sns.countplot(data=dia,x='color')
plt.xlabel("color of the diamonds")
plt.ylabel('number of diamonds')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 7/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
sns.countplot(data=dia,x='clarity')
plt.figure(figsize=(12,6))
sns.barplot(data=dia,x='cut',y='price')
plt.title('prices of the cut of diamond')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 8/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
plt.figure(figsize=(12,6))
sns.barplot(data=dia,x='color',y='price')
plt.title('prices of the cut of diamond')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 9/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
plt.figure(figsize=(12,6))
sns.barplot(data=dia,x='clarity',y='price')
plt.title('prices of the cut of diamond')
plt.figure(figsize=(12,6))
sns.barplot(data=dia,x='color',y='price',hue='cut')
plt.title('color - cut - price of diamond')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 10/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
plt.figure(figsize=(12,6))
sns.barplot(data=dia,x='cut',y='price',hue='clarity')
plt.title('cut - price - clarity of diamond')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 11/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
Data preprocessing
dia.head()
dia['cut']=dia['cut'].map({'Ideal':0,'Premium':1,'Very Good':2,'Good':3,'Fair':4})
dia['color']=dia['color'].map({'G':0,'E':1,'F':2,'H':3,'D':4,'I':5,'J':6})
dia['clarity']=dia['clarity'].map({'SI1':0,'VS2':1,'SI2':2,'VS1':3,'VVS2':4,'VVS1':5,'IF':6,
dia.head()
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 12/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
converted
0 all
0.23the categorical
0 1 values 2
into the
61.5numerical
55.0 values according
326 3.95 3.98 to the machine learning
2.43
encoding
1 0.21 1 1 0 59.8 61.0 326 3.89 3.84 2.31
Train
3 Test
0.29 Splitting
1 5 the1 dataset
62.4 58.0 334 4.20 4.23 2.63
x=dia.drop(['price'],axis=1)
y=dia['price']
x.shape,y.shape
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=1)
x_train.shape,y_train.shape,x_test.shape,y_test.shape
DTR=DecisionTreeRegressor()
DTR
▾ DecisionTreeRegressor
DecisionTreeRegressor()
DTR.fit(x_train,y_train)
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 13/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
▾ DecisionTreeRegressor
DecisionTreeRegressor()
DTR.score(x_test,y_test)
0.9644274753307684
DTR.score(x_train,y_train)
0.9999948113472171
yhat=DTR.predict(x_test)
a=yhat.mean()
a
3940.985956618465
plt.figure(figsize=(10,6))
sns.distplot(y_test,color='r',hist=False,label='Actual values')
sns.distplot(yhat,color='b',hist=False,label='Fitted values')
plt.title("Actual vs fitted values in the dataset")
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 14/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
rf=RandomForestRegressor()
rf
▾ RandomForestRegressor
RandomForestRegressor()
rf.fit(x_train,y_train)
▾ RandomForestRegressor
RandomForestRegressor()
rf.score(x_train,y_train)
0.9973538696069512
rf.score(x_test,y_test)
0.9811573144595662
yhat_rf=rf.predict(x_test)
b=yhat_rf.mean()
b
3943.780428965385
plt.figure(figsize=(10,5))
sns.distplot(y_test,hist=False,color='g')
sns.distplot(yhat_rf,hist=False,color='b')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 15/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
price1=[]
for i in dia['price']:
if i<5000:
price1.append('least')
elif (i>5001 & i<10000):
price1.append('min')
elif (i>10001 & i<15000):
price1.append('medium')
else:
price1.append('max')
print(price1)
['least', 'least', 'least', 'least', 'least', 'least', 'least', 'least', 'least', 'least
K nearest Neighbours
k = 5
knn_classifier = KNeighborsClassifier(n_neighbors=k)
knn_classifier.fit(x_train,y_train)
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 16/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
▾ KNeighborsClassifier
KNeighborsClassifier()
knn_classifier.score(x_train,y_train)
0.20012977382276603
knn_classifier.score(x_test,y_test)
0.023451983685576567
yhat_knn=knn_classifier.predict(x_test)
c=yhat_knn.mean()
c
3081.556915090842
plt.figure(figsize=(12,6))
sns.distplot(y_test,hist=False,color='g')
sns.distplot(yhat_knn,hist=False,color='r')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 17/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
lr1=LinearRegression()
lr1
▾ LinearRegression
LinearRegression()
lr1.fit(x_train,y_train)
▾ LinearRegression
LinearRegression()
lr1.score(x_train,y_train)
0.8698060950121344
lr1.score(x_test,y_test)
0.870443409999593
yhlr=lr1.predict(x_test)
d=yhlr.mean()
d
3950.3619654934037
plt.figure(figsize=(18,4))
sns.distplot(y_test,hist=False,color='y')
sns.distplot(yhlr,hist=False,color='r')
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 18/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 19/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
dia.head()
Prediction 1
input_data=[0.23,3,1,3,56.9,65.0,4.05,4.07,2.31]
inp_array=np.asarray(input_data)
inp_array
inp_rshape=inp_array.reshape(1,-1)
inp_rshape
prediction1 = DTR.predict(inp_rshape)
print("The pridicted Price for the above test data of Diamond is", prediction1, "Dollars")
The pridicted Price for the above test data of Diamond is [327.] Dollars
Prediction 2
input_data1=[0.36,2,0,1,55.6,62.2,3.89,4.01,2.99]
inp_arr=np.asarray(input_data)
inp_rshap=inp_arr.reshape(1,-1)
prediction2=rf.predict(inp_rshap)
print("The pridicted Price for the above test data of Diamond is", prediction2, "Dollars")
The pridicted Price for the above test data of Diamond is [350.17] Dollars
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 20/21
11/25/23, 12:08 PM DIAMOND PRICE PREDICTIONS.ipynb - Colaboratory
https://colab.research.google.com/drive/1pHT5QVHz-pegX8-eo5BKaRazDN1iWtvX#scrollTo=3d0f07df&printMode=true 21/21