About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
CAR car acceptability . PRICE overall price . . buying buying price . . maint price of the maintenance . TECH
technical characteristics . . COMFORT comfort . . . doors number of doors . . . persons capacity in terms of
persons to carry . . . lug_boot the size of luggage boot . . safety estimated safety of the car
buying -> v-high, high, med, low maint -> v-high, high, med, low doors -> 2, 3, 4, 5-more persons -> 2, 4,
more lug_boot -> small, med, big safety -> low, med, high
4. Missing Attribute Values: none
Data has been modified and stored in an excel sheet, and we will convert string values to integer values to
implement scikit packages
In [21]: data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 8 columns):
Unnamed: 0 1728 non-null int64
buying 1728 non-null object
maint 1728 non-null object
doors 1728 non-null object
persons 1728 non-null object
lug_boot 1728 non-null object
safety 1728 non-null object
class 1728 non-null object
dtypes: int64(1), object(7)
memory usage: 108.1+ KB
localhost:8888/nbconvert/html/Desktop/Python Data Products for Predictive Anlaytics Specialization/Course 2 - Design Thinking and Predictive Analytics for Data P… 1/5
8/21/2020 Course 2 - Final Project
In [22]: data.head()
Out[22]:
Unnamed: 0 buying maint doors persons lug_boot safety class
In [26]: le=LabelEncoder()
for i in data.columns:
data[i]=le.fit_transform(data[i])
In [27]: data.head()
Out[27]:
Unnamed: 0 buying maint doors persons lug_boot safety class
0 0 3 3 0 0 2 1 2
1 1 3 3 0 0 2 2 2
2 2 3 3 0 0 2 0 2
3 3 3 3 0 0 1 1 2
4 4 3 3 0 0 1 2 2
localhost:8888/nbconvert/html/Desktop/Python Data Products for Predictive Anlaytics Specialization/Course 2 - Design Thinking and Predictive Analytics for Data P… 2/5
8/21/2020 Course 2 - Final Project
We also create a heatmap of columns of dataset. It shows us Pearson's correlation coefficient as well.
In [29]: fig=plt.figure(figsize=(10,6))
sns.heatmap(data.corr(),annot=True).set_title("Heatmap showing Pearson's
correlation coefficient")
X dataframe consists of input data and features and y dataframe is the series which has results that we will try
to predict
In [30]: X = data[data.columns[:-1]]
y = data['class']
In [31]: data.head()
Out[31]:
Unnamed: 0 buying maint doors persons lug_boot safety class
0 0 3 3 0 0 2 1 2
1 1 3 3 0 0 2 2 2
2 2 3 3 0 0 2 0 2
3 3 3 3 0 0 1 1 2
4 4 3 3 0 0 1 2 2
localhost:8888/nbconvert/html/Desktop/Python Data Products for Predictive Anlaytics Specialization/Course 2 - Design Thinking and Predictive Analytics for Data P… 3/5
8/21/2020 Course 2 - Final Project
In [37]: regression.fit(X_train,y_train)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/optimize.py:20
3: ConvergenceWarning: newton-cg failed to converge. Increase the numbe
r of iterations.
"number of iterations.", ConvergenceWarning)
In [39]: regression.score(X_test,y_test)
Out[39]: 0.697495183044316
In [41]: knn.fit(X_train,y_train)
pred = knn.predict(X_test)
knn.score(X_test,y_test)
Out[41]: 0.6608863198458574
localhost:8888/nbconvert/html/Desktop/Python Data Products for Predictive Anlaytics Specialization/Course 2 - Design Thinking and Predictive Analytics for Data P… 4/5
8/21/2020 Course 2 - Final Project
In [43]: # We see that our accuracy scores for KNN Classifier is somewhat similar
In [44]: print(classification_report(y_test,pred))
In [45]: # We see above a detailed report of our KNN classification that provides
precision, recall, f-1 and a support scores
# We can observe a good f1 score here
Conclusion
Logistical Regression provided us a better accuracy than knn classifiers.
In [ ]:
localhost:8888/nbconvert/html/Desktop/Python Data Products for Predictive Anlaytics Specialization/Course 2 - Design Thinking and Predictive Analytics for Data P… 5/5