0% found this document useful (0 votes)
23 views6 pages

45 AIML Practical 09

Uploaded by

Ahmed Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

45 AIML Practical 09

Uploaded by

Ahmed Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Name of Student: Ahmed Mobin Ahmed Shaikh

Roll Number: 45 Lab Practical Number: 09

Title of Lab Assignment: Implementation of Bagging Algorithm:


Decision Tree / Random Forest.

DOP: 19/03/24 DOS: 27/03/24

CO Mapped: PO Mapped: Signature:


CO5. PO2, PO3,
PO4, PO5,
PO6, PO7,
PSO1, PSO2.
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory

keyboard_arrow_down Aim: Implementation of Bagging Algorithm: Decision Tree / Random Forest.


Bagging, short for Bootstrap Aggregating, is a popular ensemble learning technique used in machine learning. It involves training multiple
models independently and then combining their predictions to make a final prediction. The basic idea behind bagging is to reduce variance and
improve the overall performance of a single model by averaging or voting over multiple models trained on different subsets of the data.

Bagging offers several advantages:

Reduction of Variance: By training multiple models on different subsets of the data, bagging reduces the variance of the final prediction.
This helps to improve the generalization performance of the ensemble model.

Improved Stability: Bagging can make the model more robust to outliers and noisy data since it combines predictions from multiple
models.

Parallelizable: Since each model in a bagging ensemble is trained independently, bagging can be easily parallelized, allowing for efficient
use of computational resources.

Works with Any Base Learner: Bagging can be used with any base learning algorithm, making it a versatile technique that can be applied
to a wide range of problems.

However, it's important to note that bagging may not always lead to improvements, especially if the base learning algorithm is already robust to
variance and noise. Additionally, bagging can increase computational complexity and memory requirements since it involves training multiple
models.

keyboard_arrow_down Bagging on Dataset - Chronic Kidney Disease


keyboard_arrow_down Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

keyboard_arrow_down Get the Data


# Load the dataset
ahmed_ds = pd.read_csv('/content/kidney_disease_train.csv')
# target class label : category
#(0 - blood donor, -1 - suspect blood donor, 1 - hepatitis, 2 - fibrosis, 3 - cirrhosis)

ahmed_ds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280 entries, 0 to 279
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 280 non-null int64
1 age 275 non-null float64
2 bp 271 non-null float64
3 sg 244 non-null float64
4 al 245 non-null float64
5 su 242 non-null float64
6 rbc 173 non-null object
7 pc 230 non-null object
8 pcc 276 non-null object
9 ba 276 non-null object
10 bgr 247 non-null float64
11 bu 266 non-null float64
12 sc 268 non-null float64
13 sod 213 non-null float64
14 pot 212 non-null float64
15 hemo 241 non-null float64
16 pcv 229 non-null float64
17 wc 203 non-null object
18 rc 187 non-null object
19 htn 279 non-null object

https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 1/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory
20 dm 279 non-null object
21 cad 279 non-null object
22 appet 280 non-null object
23 pe 280 non-null object
24 ane 280 non-null object
25 classification 280 non-null object
dtypes: float64(12), int64(1), object(13)
memory usage: 57.0+ KB

ahmed_ds.isnull().sum()

output id
age
0
5
bp 9
sg 36
al 35
su 38
rbc 107
pc 50
pcc 4
ba 4
bgr 33
bu 14
sc 12
sod 67
pot 68
hemo 39
pcv 51
wc 77
rc 93
htn 1
dm 1
cad 1
appet 0
pe 0
ane 0
classification 0
dtype: int64

ahmed_ds = ahmed_ds.dropna()

ahmed_ds.isnull().sum()

id 0
age 0
bp 0
sg 0
al 0
su 0
rbc 0
pc 0
pcc 0
ba 0
bgr 0
bu 0
sc 0
sod 0
pot 0
hemo 0
pcv 0
wc 0
rc 0
htn 0
dm 0
cad 0
appet 0
pe 0
ane 0
classification 0
dtype: int64

ahmed_ds.describe()

https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 2/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory

id age bp sg al su bgr bu sc sod pot

count 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000

mean 273.551402 49.682243 73.084112 1.020047 0.794393 0.233645 130.130841 51.635514 1.971963 138.869159 4.812150

std 99.999362 16.377964 10.764303 0.005429 1.419130 0.759586 54.841123 45.669525 2.600101 7.287990 4.175122

min 11.000000 6.000000 50.000000 1.005000 0.000000 0.000000 70.000000 10.000000 0.400000 114.000000 2.900000

25% 235.500000 38.000000 60.000000 1.020000 0.000000 0.000000 99.000000 27.000000 0.700000 135.000000 3.800000

50% 298.000000 52.000000 70.000000 1.020000 0.000000 0.000000 118.000000 39.000000 1.000000 139.000000 4.600000

75% 352.000000 61.500000 80.000000 1.025000 1.000000 0.000000 131.000000 49.500000 1.250000 144.000000 4.900000

max 399.000000 83.000000 100.000000 1.025000 4.000000 4.000000 380.000000 309.000000 13.300000 150.000000 47.000000

ahmed_ds.columns

Index(['id', 'age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr',
'bu', 'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
'appet', 'pe', 'ane', 'classification'],
dtype='object')

keyboard_arrow_down Select features and target variable


x = ahmed_ds.iloc[:, 11:13] # Features
x

bu sc

0 42.0 1.7

3 25.0 1.0

6 49.0 0.9

10 18.0 1.1

12 20.0 0.5

... ... ...

272 18.0 1.1

273 148.0 3.9

275 92.0 3.3

277 34.0 1.1

278 19.0 0.5

107 rows × 2 columns

Next steps: toggle_off View recommended plots

y = ahmed_ds.iloc[:, 25] # Selecting the 25th column (index starts from 0)


y

0 ckd
3 notckd
6 notckd
10 notckd
12 notckd
...
272 notckd
273 ckd
275 ckd
277 notckd
278 notckd
Name: classification, Length: 107, dtype: object

keyboard_arrow_down Split the dataset into train and test sets


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

keyboard_arrow_down Feature scaling


https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 3/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

keyboard_arrow_down Initialize and train the Random Forest classifier


classifier = RandomForestClassifier(n_estimators=5, random_state=0)
classifier.fit(x_train, y_train)

▾ RandomForestClassifier
RandomForestClassifier(n_estimators=5, random_state=0)

keyboard_arrow_down Make predictions on the test set


y_pred = classifier.predict(x_test)

keyboard_arrow_down Evaluate the model


cm = confusion_matrix(y_test, y_pred)
print(f"CONFUSION MATRIX:\n", cm)

CONFUSION MATRIX:
[[ 5 1]
[ 0 21]]

import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Calculate classification report


clf_report = pd.DataFrame(classification_report(y_test, y_pred, output_dict=True))

# Print accuracy score


print(f"ACCURACY SCORE:\n{accuracy_score(y_test, y_pred):.4f}")

# Print classification report


print(f"CLASSIFICATION REPORT:\n{clf_report}")

ACCURACY SCORE:
0.9630
CLASSIFICATION REPORT:
ckd notckd accuracy macro avg weighted avg
precision 1.000000 0.954545 0.962963 0.977273 0.964646
recall 0.833333 1.000000 0.962963 0.916667 0.962963
f1-score 0.909091 0.976744 0.962963 0.942918 0.961710
support 6.000000 21.000000 0.962963 27.000000 27.000000

keyboard_arrow_down Plotting the decision boundary for training set


#train
from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
x1,x2 = np.meshgrid(np.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), np.arange(start = x_set[:, 1].mi
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('(Training set)')
plt.xlabel('Blood Urea')
plt.ylabel('Serum creatinine')
plt.legend()
plt.show()

https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 4/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory

<ipython-input-29-5c8bfb494df1>:8: UserWarning: *c* argument looks like a single numeric RGB or RGBA sequence, which should be avoid
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)

keyboard_arrow_down Trees of a RandomForestClassifier


from sklearn import tree #plot the RandomForestClassifier’s first 5 trees
fig,axes = plt.subplots(nrows = 1,ncols = 5,figsize = (10,2), dpi=900)
for index in range(0, 5):
tree.plot_tree(classifier.estimators_[index],feature_names= x.columns, class_names= 'status',filled = True,ax = axes[index])
axes[index].set_title('Estimator: ' + str(index+1), fontsize = 11)
fig.savefig('Random Forest 5 Trees.png')

keyboard_arrow_down Conclusion
In conclusion, bagging is a versatile and effective ensemble learning technique that enhances model performance
by combining the predictions of multiple base learners trained on bootstrapped subsets of the data.

https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 5/5

You might also like