0% found this document useful (0 votes)
23 views4 pages

Random Forest - Car - Jupyter Notebook

The document is a Jupyter Notebook detailing the implementation of a Random Forest classifier using a car evaluation dataset. It includes data loading, preprocessing, model training, and evaluation, achieving an accuracy score of 0.9649 with 100 decision trees. Additionally, it visualizes feature importance scores for the model's predictors.

Uploaded by

Aastha Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views4 pages

Random Forest - Car - Jupyter Notebook

The document is a Jupyter Notebook detailing the implementation of a Random Forest classifier using a car evaluation dataset. It includes data loading, preprocessing, model training, and evaluation, achieving an accuracy score of 0.9649 with 100 decision trees. Additionally, it visualizes feature importance scores for the model's predictors.

Uploaded by

Aastha Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

7/24/24, 1:07 PM Random forest - Jupyter Notebook

In [6]: import numpy as np # linear algebra


import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # statistical data visualization
%matplotlib inline

In [7]: df = pd.read_csv("C:\\Users\\Welcome\\Downloads\\car_evaluation.csv")

In [8]: df.shape

Out[8]: (1727, 7)

In [9]: df.head()

Out[9]:
vhigh vhigh.1 2 2.1 small low unacc

0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc

2 vhigh vhigh 2 2 med low unacc

3 vhigh vhigh 2 2 med med unacc

4 vhigh vhigh 2 2 med high unacc

In [10]: col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']




df.columns = col_names

col_names

Out[10]: ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

In [11]: df.head()

Out[11]:
buying maint doors persons lug_boot safety class

0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc

2 vhigh vhigh 2 2 med low unacc

3 vhigh vhigh 2 2 med med unacc

4 vhigh vhigh 2 2 med high unacc

In [12]: X = df.drop(['class'], axis=1)



y = df['class']

In [13]: # split data into training and testing sets



from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

In [14]: # check the shape of X_train and X_test



X_train.shape, X_test.shape

Out[14]: ((1157, 6), (570, 6))

In [15]: from tensorflow.keras.utils import to_categorical

localhost:8888/notebooks/downloads/Random forest.ipynb 1/4


7/24/24, 1:07 PM Random forest - Jupyter Notebook

In [19]: conda install -c conda-forge category_encoders

Collecting package metadata (current_repodata.json): ...working... done


Solving environment: ...working... done

## Package Plan ##

environment location: C:\Users\Welcome\anaconda3

added / updated specs:


- category_encoders

The following packages will be downloaded:

package | build
---------------------------|-----------------
category_encoders-2.2.2 | pyhd3eb1b0_0 58 KB
conda-4.14.0 | py37h03978a9_0 1018 KB conda-forge
python_abi-3.7 | 2_cp37m 4 KB conda-forge
------------------------------------------------------------
Total: 1.1 MB

The following NEW packages will be INSTALLED:

category_encoders pkgs/main/noarch::category_encoders-2.2.2-pyhd3eb1b0_0
python_abi conda-forge/win-64::python_abi-3.7-2_cp37m

The following packages will be UPDATED:

conda pkgs/main::conda-4.8.2-py37_0 --> conda-forge::conda-4.14.0-py37h03978a9_0

Downloading and Extracting Packages

python_abi-3.7 | 4 KB | | 0%
python_abi-3.7 | 4 KB | ########## | 100%

conda-4.14.0 | 1018 KB | | 0%
conda-4.14.0 | 1018 KB | 1 | 2%
conda-4.14.0 | 1018 KB | ##9 | 30%
conda-4.14.0 | 1018 KB | ######2 | 63%
conda-4.14.0 | 1018 KB | #########7 | 97%
conda-4.14.0 | 1018 KB | ########## | 100%

category_encoders-2. | 58 KB | | 0%
category_encoders-2. | 58 KB | ##7 | 27%
category_encoders-2. | 58 KB | ########## | 100%
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Note: you may need to restart the kernel to use updated packages.

==> WARNING: A newer version of conda exists. <==


current version: 4.8.2
latest version: 24.5.0

Please update conda by running

$ conda update -n base -c defaults conda

In [20]: import category_encoders as ce

In [21]: encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety'])




X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test)

C:\Users\Welcome\anaconda3\lib\site-packages\category_encoders\utils.py:21: FutureWarning: is_categorical is deprecated and wil


l be removed in a future version. Use is_categorical_dtype instead
elif pd.api.types.is_categorical(cols):

localhost:8888/notebooks/downloads/Random forest.ipynb 2/4


7/24/24, 1:07 PM Random forest - Jupyter Notebook

In [22]: X_train.head()

Out[22]:
buying maint doors persons lug_boot safety

83 1 1 1 1 1 1

48 1 1 2 2 1 2

468 2 1 2 3 2 2

155 1 2 2 2 1 1

1043 3 2 3 2 2 1

In [23]: X_test.head()

Out[23]:
buying maint doors persons lug_boot safety

599 2 2 3 1 3 1

932 3 1 3 3 3 1

628 2 2 1 1 3 3

1497 4 2 1 3 1 2

1262 3 4 3 2 1 1

In [24]: # import Random Forest classifier



from sklearn.ensemble import RandomForestClassifier



# instantiate the classifier

rfc = RandomForestClassifier(random_state=0)



# fit the model

rfc.fit(X_train, y_train)



# Predict the Test set results

y_pred = rfc.predict(X_test)



# Check accuracy score

from sklearn.metrics import accuracy_score

print('Model accuracy score with 10 decision-trees : {0:0.4f}'. format(accuracy_score(y_test, y_pred)))

Model accuracy score with 10 decision-trees : 0.9649

In [25]: # instantiate the classifier with n_estimators = 100



rfc_100 = RandomForestClassifier(n_estimators=100, random_state=0)



# fit the model to the training set

rfc_100.fit(X_train, y_train)



# Predict on the test set results

y_pred_100 = rfc_100.predict(X_test)



# Check accuracy score

print('Model accuracy score with 100 decision-trees : {0:0.4f}'. format(accuracy_score(y_test, y_pred_100)))

Model accuracy score with 100 decision-trees : 0.9649

localhost:8888/notebooks/downloads/Random forest.ipynb 3/4


7/24/24, 1:07 PM Random forest - Jupyter Notebook

In [26]: # create the classifier with n_estimators = 100



clf = RandomForestClassifier(n_estimators=100, random_state=0)



# fit the model to the training set

clf.fit(X_train, y_train)

Out[26]: RandomForestClassifier(random_state=0)

In [27]: # view the feature scores



feature_scores = pd.Series(clf.feature_importances_, index=X_train.columns).sort_values(ascending=False)

feature_scores

Out[27]: safety 0.291657


persons 0.235380
buying 0.160692
maint 0.134143
lug_boot 0.111595
doors 0.066533
dtype: float64

In [28]: # Creating a seaborn bar plot



sns.barplot(x=feature_scores, y=feature_scores.index)



# Add labels to the graph

plt.xlabel('Feature Importance Score')

plt.ylabel('Features')



# Add title to the graph

plt.title("Visualizing Important Features")



# Visualize the graph

plt.show()

In [ ]: ​

localhost:8888/notebooks/downloads/Random forest.ipynb 4/4

You might also like