45 AIML Practical 09
45 AIML Practical 09
Reduction of Variance: By training multiple models on different subsets of the data, bagging reduces the variance of the final prediction.
This helps to improve the generalization performance of the ensemble model.
Improved Stability: Bagging can make the model more robust to outliers and noisy data since it combines predictions from multiple
models.
Parallelizable: Since each model in a bagging ensemble is trained independently, bagging can be easily parallelized, allowing for efficient
use of computational resources.
Works with Any Base Learner: Bagging can be used with any base learning algorithm, making it a versatile technique that can be applied
to a wide range of problems.
However, it's important to note that bagging may not always lead to improvements, especially if the base learning algorithm is already robust to
variance and noise. Additionally, bagging can increase computational complexity and memory requirements since it involves training multiple
models.
ahmed_ds.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280 entries, 0 to 279
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 280 non-null int64
1 age 275 non-null float64
2 bp 271 non-null float64
3 sg 244 non-null float64
4 al 245 non-null float64
5 su 242 non-null float64
6 rbc 173 non-null object
7 pc 230 non-null object
8 pcc 276 non-null object
9 ba 276 non-null object
10 bgr 247 non-null float64
11 bu 266 non-null float64
12 sc 268 non-null float64
13 sod 213 non-null float64
14 pot 212 non-null float64
15 hemo 241 non-null float64
16 pcv 229 non-null float64
17 wc 203 non-null object
18 rc 187 non-null object
19 htn 279 non-null object
https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 1/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory
20 dm 279 non-null object
21 cad 279 non-null object
22 appet 280 non-null object
23 pe 280 non-null object
24 ane 280 non-null object
25 classification 280 non-null object
dtypes: float64(12), int64(1), object(13)
memory usage: 57.0+ KB
ahmed_ds.isnull().sum()
output id
age
0
5
bp 9
sg 36
al 35
su 38
rbc 107
pc 50
pcc 4
ba 4
bgr 33
bu 14
sc 12
sod 67
pot 68
hemo 39
pcv 51
wc 77
rc 93
htn 1
dm 1
cad 1
appet 0
pe 0
ane 0
classification 0
dtype: int64
ahmed_ds = ahmed_ds.dropna()
ahmed_ds.isnull().sum()
id 0
age 0
bp 0
sg 0
al 0
su 0
rbc 0
pc 0
pcc 0
ba 0
bgr 0
bu 0
sc 0
sod 0
pot 0
hemo 0
pcv 0
wc 0
rc 0
htn 0
dm 0
cad 0
appet 0
pe 0
ane 0
classification 0
dtype: int64
ahmed_ds.describe()
https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 2/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory
count 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000 107.000000
mean 273.551402 49.682243 73.084112 1.020047 0.794393 0.233645 130.130841 51.635514 1.971963 138.869159 4.812150
std 99.999362 16.377964 10.764303 0.005429 1.419130 0.759586 54.841123 45.669525 2.600101 7.287990 4.175122
min 11.000000 6.000000 50.000000 1.005000 0.000000 0.000000 70.000000 10.000000 0.400000 114.000000 2.900000
25% 235.500000 38.000000 60.000000 1.020000 0.000000 0.000000 99.000000 27.000000 0.700000 135.000000 3.800000
50% 298.000000 52.000000 70.000000 1.020000 0.000000 0.000000 118.000000 39.000000 1.000000 139.000000 4.600000
75% 352.000000 61.500000 80.000000 1.025000 1.000000 0.000000 131.000000 49.500000 1.250000 144.000000 4.900000
max 399.000000 83.000000 100.000000 1.025000 4.000000 4.000000 380.000000 309.000000 13.300000 150.000000 47.000000
ahmed_ds.columns
Index(['id', 'age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr',
'bu', 'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
'appet', 'pe', 'ane', 'classification'],
dtype='object')
bu sc
0 42.0 1.7
3 25.0 1.0
6 49.0 0.9
10 18.0 1.1
12 20.0 0.5
0 ckd
3 notckd
6 notckd
10 notckd
12 notckd
...
272 notckd
273 ckd
275 ckd
277 notckd
278 notckd
Name: classification, Length: 107, dtype: object
▾ RandomForestClassifier
RandomForestClassifier(n_estimators=5, random_state=0)
CONFUSION MATRIX:
[[ 5 1]
[ 0 21]]
import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
ACCURACY SCORE:
0.9630
CLASSIFICATION REPORT:
ckd notckd accuracy macro avg weighted avg
precision 1.000000 0.954545 0.962963 0.977273 0.964646
recall 0.833333 1.000000 0.962963 0.916667 0.962963
f1-score 0.909091 0.976744 0.962963 0.942918 0.961710
support 6.000000 21.000000 0.962963 27.000000 27.000000
https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 4/5
4/5/24, 12:51 AM 45_AIML_Practical_09.ipynb - Colaboratory
<ipython-input-29-5c8bfb494df1>:8: UserWarning: *c* argument looks like a single numeric RGB or RGBA sequence, which should be avoid
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
keyboard_arrow_down Conclusion
In conclusion, bagging is a versatile and effective ensemble learning technique that enhances model performance
by combining the predictions of multiple base learners trained on bootstrapped subsets of the data.
https://colab.research.google.com/drive/1_WlvTehI7_fmaWH1e1oR00xjQk3cXUZx#scrollTo=LAWVF6H9LQYz&printMode=true 5/5